date:20181002

RE: [PATCH][GCC][front-end][opt-framework] Update options framework for parameters to properly handle and validate configure time params. [Patch (2/3)]

2018-10-02 Thread Alexander Monakov

Hello,

On Tue, 24 Jul 2018, tamar.christ...@arm.com wrote:
> 
>   * params.c (validate_param): New.
>   (add_params): Use it.
>   (set_param_value): Refactor param validation into validate_param.
>   (diagnostic.h): Include.
>   * diagnostic.h (diagnostic_ready_p): New.

this patch was committed to trunk recently, and an automated email from
Coverity static checker has pointed out a useless self-assignment in a new
loop. It seems wrong indeed.

@@ -68,12 +73,26 @@ add_params (const param_info params[], size_t n)
[...]
+
+  /* Now perform some validation and set the value if it validates.  */
+  for (size_t i = 0; i < n; i++)
+{
+   if (validate_param (dst_params[i].default_value, dst_params[i], (int)i))
+ dst_params[i].default_value = dst_params[i].default_value;
+}
 }
 

Alexander

Re: GCC options for kernel live-patching (Was: Add a new option to control inlining only on static functions)

2018-10-02 Thread Martin Jambor

Hi,

my apologies for being terse, I'm in a meeting.

On Mon, Oct 01 2018, Qing Zhao wrote:
> Hi, Martin,
>
> I have studied a little more on
>
> https://github.com/marxin/kgraft-analysis-tool/blob/master/README.md 
> 
>
> in the Section “Usages”, from the example, we can see:
>
> the tool will report a list of affected functions for a function that will be 
> patched.
> In this list, it includes all callers of the patched function, and the cloned 
> functions from the patched function due to ipa const-propogation or ipa sra. 
>
> My question:
>
> what’s the current action to handle the cloned functions from the
> patched function due to ipa const-proposation or ipa sra, etc?

If we want to patch an inlined, cloned, or IPA-SRAed function, we also
patch all of its callers.

>
> since those cloned functions are NOT in the source code level, how to 
> generate the patches for the cloned functions? how to guarantee that after 
> the patched function is changed, the same ipa const-propogation or ipa
> sra will still happened?

You don't.

Martin

>
> a little confused here.
>
> thanks.
>
> Qing
>> On Sep 27, 2018, at 7:19 AM, Martin Jambor  wrote:
>> 
>> Hi,
>> 
>> (this message is a part of the thread originating with
>> https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01018.html)
>> 
>> On Thu, Sep 27 2018, Jan Hubicka wrote:
> If you make this to be INTERPOSABLE (which means it can be replaced by 
> different
> implementation by linker and that is probably what we want for live 
> patching)
> then also inliner, ipa-sra and other optimization will give up on these.
 
 do you suggest that to set the global function as AVAIL_INTERPOSABLE when 
 -finline-only-static 
 is present? then we should avoid all issues?
>>> 
>>> It seems to be reasonable direction I think, because it is what really 
>>> happens
>>> (well AVAIL_INTERPOSABLE still does not assume that the interposition will
>>> happen at runtime, but it is an approximation and we may introduce 
>>> something like
>>> AVAIL_RUNTIME_INTERPOSABLE if there is need for better difference).
>>> I wonder if -finline-only-static is good name for the flag though, because 
>>> it
>>> does a lot more than that.  Maybe something like -flive-patching?
>>> How much is this all tied to one particular implementation of the feature?
>> 
>> We have just had a quick discussion with two upstream maintainers of
>> Linux kernel live-patching about this and the key points were:
>> 
>> 1. SUSE live-patch creators (and I assume all that use the upstream
>>   live-patching method) use Martin Liska's (somewhat under-documented)
>>   -fdump-ipa-clones option and a utility he wrote
>>   (https://github.com/marxin/kgraft-analysis-tool) to deal with all
>>   kinds of inlining, IPA-CP and generally all IPA optimizations that
>>   internally create a clone.  The tool tells them what happened and
>>   also lists all callers that need to be live-patched.
>> 
>> 2. However, there is growing concern about other IPA analyses that do
>>   not create a clone but still affect code generation in other
>>   functions.  Kernel developers have identified and disabled IPA-RA but
>>   there is more of them such as IPA-modref analysis, stack alignment
>>   propagation and possibly quite a few others which extract information
>>   from one function and use it a caller or perhaps even some
>>   almost-unrelated functions (such as detection of read-only and
>>   write-only static global variables).
>> 
>>   The kernel live-patching community would welcome if GCC had an option
>>   that could disable all such optimizations/analyses for which it
>>   cannot provide a list of all affected functions (i.e. which ones need
>>   to be live-patched if a particular function is).
>> 
>>   I assume this is orthogonal to the proposed -finline-only-static
>>   option, but the above approach seems superior in all respects.
>> 
>> 3. The community would also like to be involved in these discussions,
>>   and therefore I am adding live-patch...@vger.kernel.org to CC.  On a
>>   related note, they will also have a live-patching mini-summit at the
>>   Linux Plumbers conference in Vancouver in November where they plan to
>>   discuss what they would like GCC to provide.
>> 
>> Thanks,
>> 
>> Martin
>>

Re: [PATCH][rs6000][PR target/87474] fix strncmp expansion with -mno-power8-vector

2018-10-02 Thread Segher Boessenkool

On Mon, Oct 01, 2018 at 11:09:44PM -0500, Aaron Sawdey wrote:
> PR/87474 happens because I didn't check that both vector and VSX instructions
> were enabled, so insns that are disabled get generated with 
> -mno-power8-vector.

>   PR target/87474
>   * config/rs6000/rs6000-string.c (expand_strn_compare): Check that both
>   vector and VSX are enabled.

You mean "P8 vector" or similar, I think?


> --- gcc/config/rs6000/rs6000-string.c (revision 264760)
> +++ gcc/config/rs6000/rs6000-string.c (working copy)
> @@ -2205,6 +2205,7 @@
>  }
>else
>  {
> +  /* Implies TARGET_P8_VECTOR here. */

That isn't true as far as I see.


Okay for trunk with improved changelog and that stray line removed.
Thanks!


Segher

[PATCH] Adjust V[24]DF reduc_plus_scal patterns (was: RFC: x87 reduc_plus_scal_* AVX (and AVX512?) expanders)

2018-10-02 Thread Richard Biener

On Mon, 1 Oct 2018, Richard Biener wrote:

> 
> I notice that for Zen we create
> 
>   0.00 │   vhaddp %ymm3,%ymm3,%ymm3
>   1.41 │   vperm2 $0x1,%ymm3,%ymm3,%ymm1
>   1.45 │   vaddpd %ymm1,%ymm2,%ymm2
> 
> from reduc_plus_scal_v4df which uses a cross-lane permute vperm2f128
> even though the upper half of the result is unused in the end
> (we only use the single-precision element zero).  Much better would
> be to use vextractf128 which is well-pipelined and has good throughput
> (though using vhaddp in itself is quite bad for Zen I didn't try
> benchmarking it against open-coding that yet, aka disabling the
> expander).

So here's an actual patch that I benchmarked and compared against
the sequence generated by ICC when tuning for core-avx.  ICC seems
to avoid haddpd for both V4DF and V2DF in favor of using
vunpckhpd plus an add so this is what I do now.  Zen is also happy
with that in addition to the lower overall latency and higher
throughput "on the paper" (consulting Agners tables).

The disadvantage for the V2DF case is that it now needs two
registers compared to just one, the advantage is that we can
enable it for SSE2 where we previously got psrldq + addpd
and now unpckhpd + addpd (psrldq has similar latency but eventually the
wrong non-FP "domain"?).

It speeds up 482.sphinx3 on Zen with -mprefer-avx256 by ~7% (with
-mprefer-avx128 the difference is in the noise).  The major issue
with the old sequence seems to be throughput of hadd and vperm2f128
with the two parallel reductions 482.sphinx3 performs resulting in
even worse latency while with the new sequence two reduction sequences
can be fully carried out in parallel (on Zen).

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

OK for trunk?

The v8sf/v4sf reduction sequences and ix86_expand_reduc probably
benefit from similar treatment (ix86_expand_reduc from first reducing
%zmm -> %ymm -> %xmm).

Thanks,
Richard.

2018-10-02  Richard Biener  

* config/i386/sse.md (reduc_plus_scal_v4df): Avoid the use
of haddv4df, first reduce to SSE width and exploit the fact
that we only need element zero with the reduction result.
(reduc_plus_scal_v2df): Likewise.

Index: gcc/config/i386/sse.md
===
--- gcc/config/i386/sse.md  (revision 264758)
+++ gcc/config/i386/sse.md  (working copy)
@@ -2473,24 +2473,28 @@ (define_expand "reduc_plus_scal_v4df"
(match_operand:V4DF 1 "register_operand")]
   "TARGET_AVX"
 {
-  rtx tmp = gen_reg_rtx (V4DFmode);
-  rtx tmp2 = gen_reg_rtx (V4DFmode);
-  rtx vec_res = gen_reg_rtx (V4DFmode);
-  emit_insn (gen_avx_haddv4df3 (tmp, operands[1], operands[1]));
-  emit_insn (gen_avx_vperm2f128v4df3 (tmp2, tmp, tmp, GEN_INT (1)));
-  emit_insn (gen_addv4df3 (vec_res, tmp, tmp2));
-  emit_insn (gen_vec_extractv4dfdf (operands[0], vec_res, const0_rtx));
+  rtx tmp = gen_reg_rtx (V2DFmode);
+  emit_insn (gen_vec_extract_hi_v4df (tmp, operands[1]));
+  rtx tmp2 = gen_reg_rtx (V2DFmode);
+  emit_insn (gen_addv2df3 (tmp2, tmp, gen_lowpart (V2DFmode, operands[1])));
+  rtx tmp3 = gen_reg_rtx (V2DFmode);
+  emit_insn (gen_vec_interleave_highv2df (tmp3, tmp2, tmp2));
+  emit_insn (gen_adddf3 (operands[0],
+gen_lowpart (DFmode, tmp2),
+gen_lowpart (DFmode, tmp3)));
   DONE;
 })

 (define_expand "reduc_plus_scal_v2df"
   [(match_operand:DF 0 "register_operand")
(match_operand:V2DF 1 "register_operand")]
-  "TARGET_SSE3"
+  "TARGET_SSE2"
 {
   rtx tmp = gen_reg_rtx (V2DFmode);
-  emit_insn (gen_sse3_haddv2df3 (tmp, operands[1], operands[1]));
-  emit_insn (gen_vec_extractv2dfdf (operands[0], tmp, const0_rtx));
+  emit_insn (gen_vec_interleave_highv2df (tmp, operands[1], operands[1]));
+  emit_insn (gen_adddf3 (operands[0],
+gen_lowpart (DFmode, tmp),
+gen_lowpart (DFmode, operands[1])));
   DONE;
 })

Re: [patch] Fix PR tree-optimization/86659

2018-10-02 Thread Eric Botcazou

> so the fix is to simply not optimize here?

Yes, we cannot turn a BIT_FIELD_REF with rev-storage into a VIEW_CONVERT_EXPR.

> Are there correctness issues with the patterns we have for rev-storage?  But
> then some cases are let through via the realpart/imagpart/v_c_e case?  I
> suppose we should never see REF_REVERSE_STORAGE_ORDER on refs operating on
> registers (SSA_NAMEs or even is_gimple_reg()s)?

REF_REVERSE_STORAGE_ORDER is set on BIT_FIELD_REF and MEM_REF only.

> Note that I think you need to adjust the GENERIC side as well, for example
>
> [...]
>
> where we lose the reverse-storage attribute as well.  You'd probably
> have to cut out rev-storage refs somewhere in genmatch.c.

I don't think that's necessary since we never fold top-level BIT_FIELD_REFs at 
the GENERIC level so far.  And given that it's impossible to control the top 
level in match.pd, I'd rather not try something impossible unless rforced to 
do it.  This ought to be controlled directly from fold-const.c, if need be.

-- 
Eric Botcazou

Re: [PATCH] Adjust V[24]DF reduc_plus_scal patterns (was: RFC: x87 reduc_plus_scal_* AVX (and AVX512?) expanders)

2018-10-02 Thread Jan Hubicka

> On Mon, 1 Oct 2018, Richard Biener wrote:
> 
> > 
> > I notice that for Zen we create
> > 
> >   0.00 │   vhaddp %ymm3,%ymm3,%ymm3
> >   1.41 │   vperm2 $0x1,%ymm3,%ymm3,%ymm1
> >   1.45 │   vaddpd %ymm1,%ymm2,%ymm2
> > 
> > from reduc_plus_scal_v4df which uses a cross-lane permute vperm2f128
> > even though the upper half of the result is unused in the end
> > (we only use the single-precision element zero).  Much better would
> > be to use vextractf128 which is well-pipelined and has good throughput
> > (though using vhaddp in itself is quite bad for Zen I didn't try
> > benchmarking it against open-coding that yet, aka disabling the
> > expander).
> 
> So here's an actual patch that I benchmarked and compared against
> the sequence generated by ICC when tuning for core-avx.  ICC seems
> to avoid haddpd for both V4DF and V2DF in favor of using
> vunpckhpd plus an add so this is what I do now.  Zen is also happy
> with that in addition to the lower overall latency and higher
> throughput "on the paper" (consulting Agners tables).
> 
> The disadvantage for the V2DF case is that it now needs two
> registers compared to just one, the advantage is that we can
> enable it for SSE2 where we previously got psrldq + addpd
> and now unpckhpd + addpd (psrldq has similar latency but eventually the
> wrong non-FP "domain"?).
> 
> It speeds up 482.sphinx3 on Zen with -mprefer-avx256 by ~7% (with
> -mprefer-avx128 the difference is in the noise).  The major issue
> with the old sequence seems to be throughput of hadd and vperm2f128
> with the two parallel reductions 482.sphinx3 performs resulting in
> even worse latency while with the new sequence two reduction sequences
> can be fully carried out in parallel (on Zen).
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> 
> OK for trunk?
> 
> The v8sf/v4sf reduction sequences and ix86_expand_reduc probably
> benefit from similar treatment (ix86_expand_reduc from first reducing
> %zmm -> %ymm -> %xmm).
> 
> Thanks,
> Richard.
> 
> 2018-10-02  Richard Biener  
> 
>   * config/i386/sse.md (reduc_plus_scal_v4df): Avoid the use
>   of haddv4df, first reduce to SSE width and exploit the fact
>   that we only need element zero with the reduction result.
>   (reduc_plus_scal_v2df): Likewise.

OK,
thanks!
Honza
> 
> Index: gcc/config/i386/sse.md
> ===
> --- gcc/config/i386/sse.md(revision 264758)
> +++ gcc/config/i386/sse.md(working copy)
> @@ -2473,24 +2473,28 @@ (define_expand "reduc_plus_scal_v4df"
> (match_operand:V4DF 1 "register_operand")]
>"TARGET_AVX"
>  {
> -  rtx tmp = gen_reg_rtx (V4DFmode);
> -  rtx tmp2 = gen_reg_rtx (V4DFmode);
> -  rtx vec_res = gen_reg_rtx (V4DFmode);
> -  emit_insn (gen_avx_haddv4df3 (tmp, operands[1], operands[1]));
> -  emit_insn (gen_avx_vperm2f128v4df3 (tmp2, tmp, tmp, GEN_INT (1)));
> -  emit_insn (gen_addv4df3 (vec_res, tmp, tmp2));
> -  emit_insn (gen_vec_extractv4dfdf (operands[0], vec_res, const0_rtx));
> +  rtx tmp = gen_reg_rtx (V2DFmode);
> +  emit_insn (gen_vec_extract_hi_v4df (tmp, operands[1]));
> +  rtx tmp2 = gen_reg_rtx (V2DFmode);
> +  emit_insn (gen_addv2df3 (tmp2, tmp, gen_lowpart (V2DFmode, operands[1])));
> +  rtx tmp3 = gen_reg_rtx (V2DFmode);
> +  emit_insn (gen_vec_interleave_highv2df (tmp3, tmp2, tmp2));
> +  emit_insn (gen_adddf3 (operands[0],
> +  gen_lowpart (DFmode, tmp2),
> +  gen_lowpart (DFmode, tmp3)));
>DONE;
>  })
>  
>  (define_expand "reduc_plus_scal_v2df"
>[(match_operand:DF 0 "register_operand")
> (match_operand:V2DF 1 "register_operand")]
> -  "TARGET_SSE3"
> +  "TARGET_SSE2"
>  {
>rtx tmp = gen_reg_rtx (V2DFmode);
> -  emit_insn (gen_sse3_haddv2df3 (tmp, operands[1], operands[1]));
> -  emit_insn (gen_vec_extractv2dfdf (operands[0], tmp, const0_rtx));
> +  emit_insn (gen_vec_interleave_highv2df (tmp, operands[1], operands[1]));
> +  emit_insn (gen_adddf3 (operands[0],
> +  gen_lowpart (DFmode, tmp),
> +  gen_lowpart (DFmode, operands[1])));
>DONE;
>  })
>

Small tweak to reorg.c

2018-10-02 Thread Eric Botcazou

This changes make_return_insns to using emit_copy_of_insn_after when it needs 
to re-emit insns that were put in a delay slot, like routines doing the same 
thing in the file, and re-emitting the jump insn directly.

Tested on SPARC/Solaris, applied on the mainline.


2018-10-02  Eric Botcazou  

* reorg.c (make_return_insns): Use emit_copy_of_insn_after for the
insns in the delay slot and add_insn_after for the jump insn.

-- 
Eric BotcazouIndex: reorg.c
===
--- reorg.c	(revision 264732)
+++ reorg.c	(working copy)
@@ -3638,18 +3638,13 @@ make_return_insns (rtx_insn *first)
 	 insns for its delay slots, if it needs some.  */
   if (ANY_RETURN_P (PATTERN (jump_insn)))
 	{
-	  rtx_insn *prev = PREV_INSN (insn);
+	  rtx_insn *after = PREV_INSN (insn);
 
 	  delete_related_insns (insn);
-	  for (i = 1; i < XVECLEN (pat, 0); i++)
-	{
-	  rtx_insn *in_seq_insn = as_a (XVECEXP (pat, 0, i));
-	  prev = emit_insn_after_setloc (PATTERN (in_seq_insn), prev,
-	 INSN_LOCATION (in_seq_insn));
-	}
-
-	  insn = emit_jump_insn_after_setloc (PATTERN (jump_insn), prev,
-	  INSN_LOCATION (jump_insn));
+	  insn = jump_insn;
+	  for (i = 1; i < pat->len (); i++)
+	after = emit_copy_of_insn_after (pat->insn (i), after);
+	  add_insn_after (insn, after, NULL);
 	  emit_barrier_after (insn);
 
 	  if (slots)

Re: [PATCH, GCC/ARM] Fix PR87374: ICE with -mslow-flash-data and -mword-relocations

2018-10-02 Thread Thomas Preudhomme

Hi Ramana,

On Thu, 27 Sep 2018 at 11:14, Ramana Radhakrishnan
 wrote:
>
> On 27/09/2018 09:26, Kyrill Tkachov wrote:
> > Hi Thomas,
> >
> > On 26/09/18 18:39, Thomas Preudhomme wrote:
> >> Hi,
> >>
> >> GCC ICEs under -mslow-flash-data and -mword-relocations because there
> >> is no way to load an address, both literal pools and MOVW/MOVT being
> >> forbidden. This patch gives an error message when both options are
> >> specified by the user and adds the according dg-skip-if directives for
> >> tests that use either of these options.
> >>
> >> ChangeLog entries are as follows:
> >>
> >> *** gcc/ChangeLog ***
> >>
> >> 2018-09-25  Thomas Preud'homme  
> >>
> >>   PR target/87374
> >>   * config/arm/arm.c (arm_option_check_internal): Disable the combined
> >>   use of -mslow-flash-data and -mword-relocations.
> >>
> >> *** gcc/testsuite/ChangeLog ***
> >>
> >> 2018-09-25  Thomas Preud'homme  
> >>
> >>   PR target/87374
> >>   * gcc.target/arm/movdi_movt.c: Skip if both -mslow-flash-data and
> >>   -mword-relocations would be passed when compiling the test.
> >>   * gcc.target/arm/movsi_movt.c: Likewise.
> >>   * gcc.target/arm/pr81863.c: Likewise.
> >>   * gcc.target/arm/thumb2-slow-flash-data-1.c: Likewise.
> >>   * gcc.target/arm/thumb2-slow-flash-data-2.c: Likewise.
> >>   * gcc.target/arm/thumb2-slow-flash-data-3.c: Likewise.
> >>   * gcc.target/arm/thumb2-slow-flash-data-4.c: Likewise.
> >>   * gcc.target/arm/thumb2-slow-flash-data-5.c: Likewise.
> >>   * gcc.target/arm/tls-disable-literal-pool.c: Likewise.
> >>
> >>
> >> Testing: Bootstrapped in Thumb-2 mode. No testsuite regression when
> >> targeting arm-none-eabi. Modified tests get skipped as expected when
> >> running the testsuite with -mslow-flash-data (pr81863.c) or
> >> -mword-relocations (all the others).
> >>
> >>
> >> Is this ok for trunk? I'd also appreciate guidance on whether this is
> >> worth a backport. It's a simple patch but on the other hand it only
> >> prevents some option combination, it does not fix anything so I have
> >> mixed feelings.
> >
> > In my opinion -mslow-flash-data is more of a tuning option rather than a 
> > security/ABI feature
> > and therefore erroring out on its combination with -mword-relocations feels 
> > odd.
> > I'm leaning more towards making -mword-relocations or any other option that 
> > really requires constant pools
> > to bypass/disable the effects of -mslow-flash-data instead.
>
> -mslow-flash-data and -mword-relocations are contradictory in their
> expectations. mslow-flash-data is for not putting anything in the
> literal pool whereas mword-relocations is purely around the use of movw
> / movt instructions for word sized values. I wish we had called
> -mslow-flash-data something else (probably -mno-literal-pools).
> -mslow-flash-data is used primarily by M-profile users and
> -mword-relocations IIUC was a point fix for use in the Linux kernel for
> module loads at a time when not all module loaders in the linux kernel
> were fixed for the movw / movt relocations and armv7-a / thumb2 was in
> it's infancy :). Thus they are used by different constituencies in
> general and I wouldn't see them used together by actual users.

Technically, -mslow-flash-data does not forbid literal pool, it just
discourages it because it's slower than many instructions. -mpure-code
on the other hand reuse the same logic and does forbid literal pools.
We could treat -mslow-flash-data differently but the question is
whether it is worth the trouble.

By the way, I've noticed that the documentation for -mword-relocations
says it defaults to on for -fpic and -fPIC but when looking through
the code I saw that target_word_relocation is not set in these case,
rather the initial commit checks that introduced -mword-relocation
also checks for flag_pic when checking target_word_relocation. However
a later commit added one more check for target_word_relocations but
nothing for flag_pic. I'm now consolidating this so that flag_pic sets
target_word_relocations. I'll do a regression testing with -fPIC and
then post an updated patch.

>
> Considering the above, I would prefer a hard error rather than a warning
> as they are contradictory and I'd prefer that we error'd out. Further
> this bugzilla entry is probably created with fuzzing with a variety of
> options rather than from any real use case.
>
> Oh and yes, lets update invoke.texi while here.

Done. Will be part of the updated patch.

Best regards,

Thomas

Re: [PATCH, OpenACC] Properly handle wait clause with no arguments

2018-10-02 Thread Chung-Lin Tang


Ping (adding Thomas to CC as OpenACC maintainer)

On 2018/8/30 9:27 PM, Chung-Lin Tang wrote:

Hi, this patch properly handles OpenACC 'wait' clauses without arguments, making it an 
equivalent of "wait all".
(current trunk basically discards and ignores such argument-less wait clauses)  
This adds additional handling in
the pack/unpack of the wait argument across the compiler/libgomp interface, but 
is done in a matter that
doesn't affect binary compatibility.

This patch was part of the OpenACC async re-work that was done on the gomp4 
branch (later merged to OG7/OG8), see [1].
I'm separating this part out and submitting it first because it's logically 
independent.

[1] https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01842.html

Re-tested with offloading to ensure no regressions, is this okay for trunk?

Thanks,
Chung-Lin

2018-08-30  Chung-Lin Tang  

     gcc/c/
     * c-parser.c (c_parser_oacc_clause_wait): Add representation of wait
     clause without argument as 'wait (GOMP_ASYNC_NOVAL)', adjust comments.

     gcc/cp/
     * parser.c (cp_parser_oacc_clause_wait): Add representation of wait
     clause without argument as 'wait (GOMP_ASYNC_NOVAL)', adjust comments.

     gcc/fortran/
     * trans-openmp.c (gfc_trans_omp_clauses_1): Add representation of wait
     clause without argument as 'wait (GOMP_ASYNC_NOVAL)'.

     gcc/
     * omp-low.c (expand_omp_target): Add middle-end support for handling
     OMP_CLAUSE_WAIT clause with a GOMP_ASYNC_NOVAL(-1) as the argument.

     include/
     * gomp-constants.h (GOMP_LAUNCH_OP_MASK): Define.
     (GOMP_LAUNCH_PACK): Add bitwise-and of GOMP_LAUNCH_OP_MASK.
     (GOMP_LAUNCH_OP): Likewise.

     libgomp/
     * oacc-parallel.c (GOACC_parallel_keyed): Interpret launch op as
     signed 16-bit field, adjust num_waits handling.
     (GOACC_enter_exit_data): Adjust num_waits handling.
     (GOACC_update): Adjust num_waits handling.

Privatize do_jump and do_jump_1

2018-10-02 Thread Eric Botcazou

There is a single, convoluted use of do_jump outside dojump.c and no uses of 
do_jump_1 at all.

Tested on x86-64/Linux, applied on the mainline as obvious.


2018-10-02  Eric Botcazou  

* dojump.h (do_jump): Delete.
(do_jump_1): Likewise.
(split_comparison): Move around.
* dojump.c (do_jump): Make static.
(do_jump_1): Likewise.
(jumpifnot): Move around.
(jumpifnot_1): Likewise.
(jumpif): Likewise.
(jumpif_1): Likewise.
* expr.c (expand_expr_real_1): Call jumpif[not] instead of do_jump.

-- 
Eric BotcazouIndex: dojump.h
===
--- dojump.h	(revision 264732)
+++ dojump.h	(working copy)
@@ -56,29 +56,22 @@ extern void save_pending_stack_adjust (s
 
 extern void restore_pending_stack_adjust (saved_pending_stack_adjust *);
 
-/* Generate code to evaluate EXP and jump to LABEL if the value is zero.  */
-extern void jumpifnot (tree exp, rtx_code_label *label,
-		   profile_probability prob);
-extern void jumpifnot_1 (enum tree_code, tree, tree, rtx_code_label *,
-			 profile_probability);
+extern bool split_comparison (enum rtx_code, machine_mode,
+			  enum rtx_code *, enum rtx_code *);
 
 /* Generate code to evaluate EXP and jump to LABEL if the value is nonzero.  */
 extern void jumpif (tree exp, rtx_code_label *label, profile_probability prob);
 extern void jumpif_1 (enum tree_code, tree, tree, rtx_code_label *,
 		  profile_probability);
 
-/* Generate code to evaluate EXP and jump to IF_FALSE_LABEL if
-   the result is zero, or IF_TRUE_LABEL if the result is one.  */
-extern void do_jump (tree exp, rtx_code_label *if_false_label,
-		 rtx_code_label *if_true_label, profile_probability prob);
-extern void do_jump_1 (enum tree_code, tree, tree, rtx_code_label *,
-		   rtx_code_label *, profile_probability);
+/* Generate code to evaluate EXP and jump to LABEL if the value is zero.  */
+extern void jumpifnot (tree exp, rtx_code_label *label,
+		   profile_probability prob);
+extern void jumpifnot_1 (enum tree_code, tree, tree, rtx_code_label *,
+			 profile_probability);
 
 extern void do_compare_rtx_and_jump (rtx, rtx, enum rtx_code, int,
  machine_mode, rtx, rtx_code_label *,
  rtx_code_label *, profile_probability);
 
-extern bool split_comparison (enum rtx_code, machine_mode,
-			  enum rtx_code *, enum rtx_code *);
-
 #endif /* GCC_DOJUMP_H */
Index: dojump.c
===
--- dojump.c	(revision 264732)
+++ dojump.c	(working copy)
@@ -38,6 +38,8 @@ along with GCC; see the file COPYING3.
 #include "langhooks.h"
 
 static bool prefer_and_bit_test (scalar_int_mode, int);
+static void do_jump (tree, rtx_code_label *, rtx_code_label *,
+		 profile_probability);
 static void do_jump_by_parts_greater (scalar_int_mode, tree, tree, int,
   rtx_code_label *, rtx_code_label *,
   profile_probability);
@@ -118,38 +120,6 @@ restore_pending_stack_adjust (saved_pend
 }
 }
 
-/* Expand conditional expressions.  */
-
-/* Generate code to evaluate EXP and jump to LABEL if the value is zero.  */
-
-void
-jumpifnot (tree exp, rtx_code_label *label, profile_probability prob)
-{
-  do_jump (exp, label, NULL, prob.invert ());
-}
-
-void
-jumpifnot_1 (enum tree_code code, tree op0, tree op1, rtx_code_label *label,
-	 profile_probability prob)
-{
-  do_jump_1 (code, op0, op1, label, NULL, prob.invert ());
-}
-
-/* Generate code to evaluate EXP and jump to LABEL if the value is nonzero.  */
-
-void
-jumpif (tree exp, rtx_code_label *label, profile_probability prob)
-{
-  do_jump (exp, NULL, label, prob);
-}
-
-void
-jumpif_1 (enum tree_code code, tree op0, tree op1,
-	  rtx_code_label *label, profile_probability prob)
-{
-  do_jump_1 (code, op0, op1, NULL, label, prob);
-}
-
 /* Used internally by prefer_and_bit_test.  */
 
 static GTY(()) rtx and_reg;
@@ -197,7 +167,7 @@ prefer_and_bit_test (scalar_int_mode mod
OP0 CODE OP1 .  IF_FALSE_LABEL and IF_TRUE_LABEL like in do_jump.
PROB is probability of jump to if_true_label.  */
 
-void
+static void
 do_jump_1 (enum tree_code code, tree op0, tree op1,
 	   rtx_code_label *if_false_label, rtx_code_label *if_true_label,
 	   profile_probability prob)
@@ -417,7 +387,7 @@ do_jump_1 (enum tree_code code, tree op0
 
PROB is probability of jump to if_true_label.  */
 
-void
+static void
 do_jump (tree exp, rtx_code_label *if_false_label,
 	 rtx_code_label *if_true_label, profile_probability prob)
 {
@@ -946,6 +916,43 @@ split_comparison (enum rtx_code code, ma
 }
 }
 
+/* Generate code to evaluate EXP and jump to LABEL if the value is nonzero.
+   PROB is probability of jump to LABEL.  */
+
+void
+jumpif (tree exp, rtx_code_label *label, profile_probability prob)
+{
+  do_jump (exp, NULL, label, prob);
+}
+
+/* Similar to jumpif but dealing with exploded comparisons of the type
+   OP0 CODE OP1 .  LABEL and PROB

[PATCH,Fortran] Fix libgfortran/io/close.c for !HAVE_UNLINK_OPEN_FILE

2018-10-02 Thread Gerald Pfeifer

Revision r215307 | jb | 2014-09-16 23:40:28 +0200 (Di, 16 Sep 2014)

   PR libfortran/62768 Handle filenames with embedded null characters.
   :

made the changes like the following to libgfortran/io/close.c

   #if !HAVE_UNLINK_OPEN_FILE
   - path = fc_strdup (u->file, u->file_len);
   + path = strdup (u->filename);
   #endif


One of our users now reported this build failure for a system where 
(for whatever reason) HAVE_UNLINK_OPEN_FILE is not defined:

   .../GCC-HEAD/libgfortran/io/close.c:94:11: error: implicit declaration of 
function ‘strdup’
   94 |path = strdup (u->filename);
  |   ^~


By #undef-ining HAVE_UNLINK_OPEN_FILE beetween the #include "..." and 
#include <...> statements in libgfortran/io/close.c I could reproduce 
this on FreeBSD 11/i386.

And I could validate the fix below, both with and without that #undef
in place.


Tested on i386-unknown-freebsd11.1.  


Okay to commit?

I'd also like to apply this to older release branches (down to GCC 6)
since it is obviously broken and the fix appears straightforward. If
approved, I'm thinking to wait about a week or two before making each 
step backwards (from HEAD to 8, 8 to 7, and 7 to 6).

Gerald


2018-10-02  Gerald Pfeifer  

* io/close.c [!HAVE_UNLINK_OPEN_FILE]: Include .

Index: libgfortran/io/close.c
===
--- libgfortran/io/close.c  (revision 264772)
+++ libgfortran/io/close.c  (working copy)
@@ -26,6 +26,9 @@ see the files COPYING3 and COPYING.RUNTIME respect
 #include "unix.h"
 #include "async.h"
 #include 
+#if !HAVE_UNLINK_OPEN_FILE
+#include 
+#endif
 
 typedef enum
 { CLOSE_DELETE, CLOSE_KEEP, CLOSE_UNSPECIFIED }

Re: [PATCH] libstdc++: Remove unused define

2018-10-02 Thread Jonathan Wakely


On 01/10/18 23:01 +0200, Bernhard Reutner-Fischer wrote:

__NO_STRING_INLINES was removed from uClibc around 2004 so has no
effect.

Ok for trunk?


OK, thanks.

Re: Use -fno-show-column in libstdc++ installed testing

2018-10-02 Thread Jonathan Wakely


On 02/10/18 00:58 +, Joseph Myers wrote:

 arranged for
libstdc++ tests to use -fno-show-column by default, but only for
build-tree testing.  This patch adds it to the options used for
installed testing as well.

Tested with installed testing for a cross to x86_64-linux-gnu, where
it fixes various test failures.


Great, thanks.

This is OK for trunk, and I have no objections to also changing it for
the branches if you want it there.

Re: [PATCH, GCC/ARM] Fix PR87374: ICE with -mslow-flash-data and -mword-relocations

2018-10-02 Thread Ramana Radhakrishnan


On 02/10/2018 11:42, Thomas Preudhomme wrote:

Hi Ramana,

On Thu, 27 Sep 2018 at 11:14, Ramana Radhakrishnan
 wrote:


On 27/09/2018 09:26, Kyrill Tkachov wrote:

Hi Thomas,

On 26/09/18 18:39, Thomas Preudhomme wrote:

Hi,

GCC ICEs under -mslow-flash-data and -mword-relocations because there
is no way to load an address, both literal pools and MOVW/MOVT being
forbidden. This patch gives an error message when both options are
specified by the user and adds the according dg-skip-if directives for
tests that use either of these options.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2018-09-25  Thomas Preud'homme  

   PR target/87374
   * config/arm/arm.c (arm_option_check_internal): Disable the combined
   use of -mslow-flash-data and -mword-relocations.

*** gcc/testsuite/ChangeLog ***

2018-09-25  Thomas Preud'homme  

   PR target/87374
   * gcc.target/arm/movdi_movt.c: Skip if both -mslow-flash-data and
   -mword-relocations would be passed when compiling the test.
   * gcc.target/arm/movsi_movt.c: Likewise.
   * gcc.target/arm/pr81863.c: Likewise.
   * gcc.target/arm/thumb2-slow-flash-data-1.c: Likewise.
   * gcc.target/arm/thumb2-slow-flash-data-2.c: Likewise.
   * gcc.target/arm/thumb2-slow-flash-data-3.c: Likewise.
   * gcc.target/arm/thumb2-slow-flash-data-4.c: Likewise.
   * gcc.target/arm/thumb2-slow-flash-data-5.c: Likewise.
   * gcc.target/arm/tls-disable-literal-pool.c: Likewise.


Testing: Bootstrapped in Thumb-2 mode. No testsuite regression when
targeting arm-none-eabi. Modified tests get skipped as expected when
running the testsuite with -mslow-flash-data (pr81863.c) or
-mword-relocations (all the others).


Is this ok for trunk? I'd also appreciate guidance on whether this is
worth a backport. It's a simple patch but on the other hand it only
prevents some option combination, it does not fix anything so I have
mixed feelings.


In my opinion -mslow-flash-data is more of a tuning option rather than a 
security/ABI feature
and therefore erroring out on its combination with -mword-relocations feels odd.
I'm leaning more towards making -mword-relocations or any other option that 
really requires constant pools
to bypass/disable the effects of -mslow-flash-data instead.


-mslow-flash-data and -mword-relocations are contradictory in their
expectations. mslow-flash-data is for not putting anything in the
literal pool whereas mword-relocations is purely around the use of movw
/ movt instructions for word sized values. I wish we had called
-mslow-flash-data something else (probably -mno-literal-pools).
-mslow-flash-data is used primarily by M-profile users and
-mword-relocations IIUC was a point fix for use in the Linux kernel for
module loads at a time when not all module loaders in the linux kernel
were fixed for the movw / movt relocations and armv7-a / thumb2 was in
it's infancy :). Thus they are used by different constituencies in
general and I wouldn't see them used together by actual users.


Technically, -mslow-flash-data does not forbid literal pool, it just
discourages it because it's slower than many instructions. -mpure-code
on the other hand reuse the same logic and does forbid literal pools.
We could treat -mslow-flash-data differently but the question is
whether it is worth the trouble.


Well, yeah I don't see the need for it. You could argue that 
-mslow-flash-data can be porous but realistically if you want this as an 
effective performance option , such porosity should be discouraged very 
strongly ;)




By the way, I've noticed that the documentation for -mword-relocations
says it defaults to on for -fpic and -fPIC but when looking through
the code I saw that target_word_relocation is not set in these case,
rather the initial commit checks that introduced -mword-relocation
also checks for flag_pic when checking target_word_relocation. However
a later commit added one more check for target_word_relocations but
nothing for flag_pic. I'm now consolidating this so that flag_pic sets
target_word_relocations. I'll do a regression testing with -fPIC and
then post an updated patch.


I'm reasonably sure that's *not* going to have *any* effect on code 
generation as in the -fpic / -fPIC case we always produce the funny GOT 
unspecs and have never used movw / movt instructions in those sequences 
for addressing. If that had happened most of the world's dynamic 
libraries would have faulted by now because I don't think they can 
process absolute movw / movt relocations.



It is automatically implied by the fact that we never produced PC 
relative versions of the immediates that get put into movw / movt 
instructions. I don't even remember us having effective relocations to 
implement this but this is going back a few years now.



Sure that probably needs a comment rather than being implicit from the 
source or from my own head :)


Ramana

RE: [PATCH][GCC][front-end][opt-framework] Update options framework for parameters to properly handle and validate configure time params. [Patch (2/3)]

2018-10-02 Thread Tamar Christina

Hi Alexander,

> -Original Message-
> From: Alexander Monakov 
> Sent: Tuesday, October 2, 2018 08:01
> To: Tamar Christina 
> Cc: Jeff Law ; gcc-patches@gcc.gnu.org; nd
> ; jos...@codesourcery.com
> Subject: RE: [PATCH][GCC][front-end][opt-framework] Update options
> framework for parameters to properly handle and validate configure time
> params. [Patch (2/3)]
> 
> Hello,
> 
> On Tue, 24 Jul 2018, tamar.christ...@arm.com wrote:
> >
> > * params.c (validate_param): New.
> > (add_params): Use it.
> > (set_param_value): Refactor param validation into validate_param.
> > (diagnostic.h): Include.
> > * diagnostic.h (diagnostic_ready_p): New.
> 
> this patch was committed to trunk recently, and an automated email from
> Coverity static checker has pointed out a useless self-assignment in a new
> loop. It seems wrong indeed.
> 

It's not wrong it's just unneeded. I'll write a patch to remove the assignment.

Thanks,
Tamar

> @@ -68,12 +73,26 @@ add_params (const param_info params[], size_t n) [...]
> +
> +  /* Now perform some validation and set the value if it validates.  */
> +  for (size_t i = 0; i < n; i++)
> +{
> +   if (validate_param (dst_params[i].default_value, dst_params[i], 
> (int)i))
> +   dst_params[i].default_value = dst_params[i].default_value;
> +}
>  }
> 
> 
> Alexander

[PATCH][GCC][front-end][opt-framework] Remove superfluous assignment in add_params.

2018-10-02 Thread Tamar Christina

Hi All,

This fixes the superfluous assignment that Coverity reported in add_params,
and changes the starting index from 0 to num_params - n in order for it to
work properly if add_params is called multiple times.

validate_params calls error so it doesn't matter that we don't check the
results here.  The results is checked in individual parameter updates after
front-end initialization.

bootstrapped and reg-tested on aarch64-none-linux-gnu and no issues.
bootstrapped x86_64-pc-linux-gnu and no issues.

Manually modified params.def to contain invalid values and code still works
as expected.  Manually passed invalid params and still errors out doing target
specific validations.  Testsuite also has existing validation tests.

Ok for trunk?

Thanks,
Tamar

gcc/:
2018-10-02  Tamar Christina  

* params.c (add_params): Fix initialization.

-- 
diff --git a/gcc/params.c b/gcc/params.c
index b6a33dfd6bf8c4df43fdac91e30ac6d082f39071..af473968e0b65a99d9ee179356c96bbfdadb46e7 100644
--- a/gcc/params.c
+++ b/gcc/params.c
@@ -87,12 +87,10 @@ add_params (const param_info params[], size_t n)
   if (!diagnostic_ready_p ())
 diagnostic_initialize (global_dc, 0);
 
-  /* Now perform some validation and set the value if it validates.  */
-  for (size_t i = 0; i < n; i++)
-{
-   if (validate_param (dst_params[i].default_value, dst_params[i], (int)i))
-	  dst_params[i].default_value = dst_params[i].default_value;
-}
+  /* Now perform some validation and validation failures trigger an error so
+ initialization will stop.  */
+  for (size_t i = num_compiler_params - n; i < n; i++)
+validate_param (params[i].default_value, params[i], (int)i);
 }
 
 /* Add all parameters and default values that can be set in both the

Re: [PATCH][rs6000][PR target/87474] fix strncmp expansion with -mno-power8-vector

2018-10-02 Thread Aaron Sawdey




On 10/2/18 3:38 AM, Segher Boessenkool wrote:
> On Mon, Oct 01, 2018 at 11:09:44PM -0500, Aaron Sawdey wrote:
>> PR/87474 happens because I didn't check that both vector and VSX instructions
>> were enabled, so insns that are disabled get generated with 
>> -mno-power8-vector.
> 
>>  PR target/87474
>>  * config/rs6000/rs6000-string.c (expand_strn_compare): Check that both
>>  vector and VSX are enabled.
> 
> You mean "P8 vector" or similar, I think?

True, it should say TARGET_P[89]_VECTOR.

> 
> 
>> --- gcc/config/rs6000/rs6000-string.c(revision 264760)
>> +++ gcc/config/rs6000/rs6000-string.c(working copy)
>> @@ -2205,6 +2205,7 @@
>>  }
>>else
>>  {
>> +  /* Implies TARGET_P8_VECTOR here. */
> 
> That isn't true as far as I see.

We can only enter emit_final_str_compare_vec() if TARGET_P8_VECTOR is set.
That's the additional check this patch adds. So in this function you can have
both P8 and P9 vector, or just p8 vector. rs6000_option_override_internal()
enforces that P8 vector must be set if P9 vector is set. So in the else here
we know that only p8 vector is set.

> 
> 
> Okay for trunk with improved changelog and that stray line removed.
> Thanks!
> 
> 
> Segher
> 

-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain

Re: [PATCH][rs6000][PR target/87474] fix strncmp expansion with -mno-power8-vector

2018-10-02 Thread Segher Boessenkool

On Tue, Oct 02, 2018 at 08:24:31AM -0500, Aaron Sawdey wrote:
> >> @@ -2205,6 +2205,7 @@
> >>  }
> >>else
> >>  {
> >> +  /* Implies TARGET_P8_VECTOR here. */
> > 
> > That isn't true as far as I see.
> 
> We can only enter emit_final_str_compare_vec() if TARGET_P8_VECTOR is set.
> That's the additional check this patch adds. So in this function you can have
> both P8 and P9 vector, or just p8 vector. rs6000_option_override_internal()
> enforces that P8 vector must be set if P9 vector is set. So in the else here
> we know that only p8 vector is set.

It's not obvious.  Add an assert, maybe?  Instead of the comment :-)


Segher

Re: GCC options for kernel live-patching (Was: Add a new option to control inlining only on static functions)

2018-10-02 Thread Qing Zhao



> On Oct 2, 2018, at 3:33 AM, Martin Jambor  wrote:
> 
> Hi,
> 
> my apologies for being terse, I'm in a meeting.
> 
> On Mon, Oct 01 2018, Qing Zhao wrote:
>> Hi, Martin,
>> 
>> I have studied a little more on
>> 
>> https://github.com/marxin/kgraft-analysis-tool/blob/master/README.md 
>> 
>> 
>> in the Section “Usages”, from the example, we can see:
>> 
>> the tool will report a list of affected functions for a function that will 
>> be patched.
>> In this list, it includes all callers of the patched function, and the 
>> cloned functions from the patched function due to ipa const-propogation or 
>> ipa sra. 
>> 
>> My question:
>> 
>> what’s the current action to handle the cloned functions from the
>> patched function due to ipa const-proposation or ipa sra, etc?
> 
> If we want to patch an inlined, cloned, or IPA-SRAed function, we also
> patch all of its callers.

take the example from the link:

$ gcc /home/marxin/Programming/linux/aesni-intel_glue.i -O2 -fdump-ipa-clones -c
$ ./kgraft-ipa-analysis.py aesni-intel_glue.i.000i.ipa-clones

[..skipped..]
Function: fls64/63 (./arch/x86/include/asm/bitops.h:479:90)
  inlining to: __ilog2_u64/132 (include/linux/log2.h:40:5)
inlining to: ablkcipher_request_alloc/1639 (include/linux/crypto.h:979:82)
  constprop: ablkcipher_request_alloc.constprop.8/3198 
(include/linux/crypto.h:979:82)
inlining to: helper_rfc4106_decrypt/3007 
(arch/x86/crypto/aesni-intel_glue.c:1016:12)
inlining to: helper_rfc4106_encrypt/3006 
(arch/x86/crypto/aesni-intel_glue.c:939:12)

  Affected functions: 5
__ilog2_u64/132 (include/linux/log2.h:40:5)
ablkcipher_request_alloc/1639 (include/linux/crypto.h:979:82)
ablkcipher_request_alloc.constprop.8/3198 (include/linux/crypto.h:979:82)
helper_rfc4106_decrypt/3007 (arch/x86/crypto/aesni-intel_glue.c:1016:12)
helper_rfc4106_encrypt/3006 (arch/x86/crypto/aesni-intel_glue.c:939:12)
[..skipped..]


if we want to patch the function “fls64/63”,  what else functions we need to 
patch, too? my guess is:

**all the callers:
__ilog2_u64/132
ablkcipher_request_alloc/1639
helper_rfc4106_decrypt/3007
helper_rfc4106_encrypt/3006 
**and:
ablkcipher_request_alloc.constprop.8/3198
is the above correct?

how to generate patch for ablkcipher_request_alloc.constprop.8/3198? since it’s 
not a function in the source code?

Qing

> 
>> 
>> since those cloned functions are NOT in the source code level, how to 
>> generate the patches for the cloned functions? how to guarantee that after 
>> the patched function is changed, the same ipa const-propogation or ipa
>> sra will still happened?
> 
> You don't.
> 
> Martin
> 
>>

Re: Fold more boolean expressions

2018-10-02 Thread MCC CS

Thanks a lot!

[PATCH] Avoid redundant runtime checks in std::visit

2018-10-02 Thread Jonathan Wakely


Calling std::get will check some static assertions and also do a runtime
check for a valid index before calling __detail::__variant::__get. The
std::visit function already handles the case where any variant has an
invalid index, so __get can be used directly in __visit_invoke.

* include/std/variant (__gen_vtable_impl::__visit_invoke): Call __get
directly instead of get, as caller ensures correct index is used.
(holds_alternative, get, get_if): Remove redundant inline specifiers.
(_VARIANT_RELATION_FUNCTION_TEMPLATE): Likewise.

Tested x86_64-linux, committed to trunk.

commit 79a88f5377344677fae1a21028293e8943eea209
Author: Jonathan Wakely 
Date:   Tue Oct 2 14:12:20 2018 +0100

Avoid redundant runtime checks in std::visit

Calling std::get will check some static assertions and also do a runtime
check for a valid index before calling __detail::__variant::__get. The
std::visit function already handles the case where any variant has an
invalid index, so __get can be used directly in __visit_invoke.

* include/std/variant (__gen_vtable_impl::__visit_invoke): Call 
__get
directly instead of get, as caller ensures correct index is used.
(holds_alternative, get, get_if): Remove redundant inline 
specifiers.
(_VARIANT_RELATION_FUNCTION_TEMPLATE): Likewise.

diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index 9289eef28cf..ff340cfc897 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -811,9 +811,8 @@ namespace __variant
{
  using _Alternative = variant_alternative_t<__index, _Next>;
  __element = __gen_vtable_impl<
-   remove_reference_t<
- decltype(__element)>, tuple<_Variants...>,
- std::index_sequence<__indices..., __index>>::_S_apply();
+   remove_reference_t, tuple<_Variants...>,
+   std::index_sequence<__indices..., __index>>::_S_apply();
}
 };
 
@@ -826,11 +825,11 @@ namespace __variant
   using _Array_type =
  _Multi_array<_Result_type (*)(_Visitor&&, _Variants...)>;
 
-  decltype(auto)
-  static constexpr __visit_invoke(_Visitor&& __visitor, _Variants... 
__vars)
+  static constexpr decltype(auto)
+  __visit_invoke(_Visitor&& __visitor, _Variants... __vars)
   {
return std::__invoke(std::forward<_Visitor>(__visitor),
-   std::get<__indices>(std::forward<_Variants>(__vars))...);
+   __variant::__get<__indices>(std::forward<_Variants>(__vars))...);
   }
 
   static constexpr auto
@@ -871,8 +870,8 @@ namespace __variant
 } // namespace __detail
 
   template
-inline constexpr bool holds_alternative(const variant<_Types...>& __v)
-noexcept
+constexpr bool
+holds_alternative(const variant<_Types...>& __v) noexcept
 {
   static_assert(__detail::__variant::__exactly_once<_Tp, _Types...>,
"T should occur for exactly once in alternatives");
@@ -880,7 +879,7 @@ namespace __variant
 }
 
   template
-constexpr inline _Tp& get(variant<_Types...>& __v)
+constexpr _Tp& get(variant<_Types...>& __v)
 {
   static_assert(__detail::__variant::__exactly_once<_Tp, _Types...>,
"T should occur for exactly once in alternatives");
@@ -889,7 +888,7 @@ namespace __variant
 }
 
   template
-constexpr inline _Tp&& get(variant<_Types...>&& __v)
+constexpr _Tp&& get(variant<_Types...>&& __v)
 {
   static_assert(__detail::__variant::__exactly_once<_Tp, _Types...>,
"T should occur for exactly once in alternatives");
@@ -899,7 +898,7 @@ namespace __variant
 }
 
   template
-constexpr inline const _Tp& get(const variant<_Types...>& __v)
+constexpr const _Tp& get(const variant<_Types...>& __v)
 {
   static_assert(__detail::__variant::__exactly_once<_Tp, _Types...>,
"T should occur for exactly once in alternatives");
@@ -908,7 +907,7 @@ namespace __variant
 }
 
   template
-constexpr inline const _Tp&& get(const variant<_Types...>&& __v)
+constexpr const _Tp&& get(const variant<_Types...>&& __v)
 {
   static_assert(__detail::__variant::__exactly_once<_Tp, _Types...>,
"T should occur for exactly once in alternatives");
@@ -918,8 +917,7 @@ namespace __variant
 }
 
   template
-constexpr inline
-add_pointer_t>>
+constexpr add_pointer_t>>
 get_if(variant<_Types...>* __ptr) noexcept
 {
   using _Alternative_type = variant_alternative_t<_Np, variant<_Types...>>;
@@ -932,7 +930,7 @@ namespace __variant
 }
 
   template
-constexpr inline
+constexpr
 add_pointer_t>>
 get_if(const variant<_Types...>* __ptr) noexcept
 {
@@ -946,7 +944,7 @@ namespace __variant
 }
 
   template
-constexpr inline add_pointer_t<_Tp>
+constexpr add_pointer_t<_Tp>
 get_if(variant<_Type

[PATCHv3][PR 81376][PING] Remove unnecessary float casts in comparisons

2018-10-02 Thread Yuri Gribov

Hi all,

This is a second iteration of patch which gets rid of float casts in
comparisons when all values of casted integral type are exactly
representable by the float type
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81376). The new version
addresses issue spotted by Richard in previous version
https://gcc.gnu.org/ml/gcc-patches/2018-02/msg01119.html

Bootstrapped and regtested on x64_64.

-Y
From 396be5ffa5e6bf919d78fa91885186876bce5461 Mon Sep 17 00:00:00 2001
From: Yury Gribov 
Date: Fri, 29 Sep 2017 07:34:54 +0200
Subject: [PATCH] Add pattern to remove useless float casts in comparison.

2018-09-18  Yury Gribov  

	PR middle-end/81376

gcc/
	* real.c (format_helper::can_represent_integral_type_p): New function
	* real.h (format_helper::can_represent_integral_type_p): Ditto.
	* match.pd: New pattern.

gcc/testsuite/
	* c-c++-common/pr81376.c: New test.
	* gcc.target/i386/387-ficom-2.c: Update test.
	* gcc.target/i386/387-ficom-2.c: Ditto.
---
 gcc/match.pd| 35 -
 gcc/real.c  | 13 
 gcc/real.h  |  1 +
 gcc/testsuite/c-c++-common/pr81376.c| 48 +
 gcc/testsuite/gcc.target/i386/387-ficom-1.c |  5 +--
 gcc/testsuite/gcc.target/i386/387-ficom-2.c |  5 +--
 6 files changed, 96 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/pr81376.c

diff --git a/gcc/match.pd b/gcc/match.pd
index be669ca..080b0d3 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3348,6 +3348,32 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (! HONOR_NANS (@0))
 	(cmp @0 @1))
 
+/* Optimize various special cases of (FTYPE) N CMP (FTYPE) M.  */
+(for cmp (tcc_comparison)
+ (simplify
+  (cmp (float@0 @1) (float @2))
+   (if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (@0))
+	&& ! DECIMAL_FLOAT_TYPE_P (TREE_TYPE (@0)))
+(with
+ {
+   format_helper fmt (REAL_MODE_FORMAT (TYPE_MODE (TREE_TYPE (@0;
+   tree type1 = TREE_TYPE (@1);
+   bool type1_signed_p = TYPE_SIGN (type1) == SIGNED;
+   tree type2 = TREE_TYPE (@2);
+   bool type2_signed_p = TYPE_SIGN (type2) == SIGNED;
+ }
+ (if (fmt.can_represent_integral_type_p (type1)
+	  && fmt.can_represent_integral_type_p (type2))
+  (if (TYPE_PRECISION (type1) > TYPE_PRECISION (type2)
+   && type1_signed_p >= type2_signed_p)
+   (cmp @1 (convert @2))
+   (if (TYPE_PRECISION (type1) < TYPE_PRECISION (type2)
+&& type1_signed_p <= type2_signed_p)
+(cmp (convert:type2 @1) @2)
+(if (TYPE_PRECISION (type1) == TYPE_PRECISION (type2)
+ && type1_signed_p == type2_signed_p)
+	 (cmp @1 @2)
+
 /* Optimize various special cases of (FTYPE) N CMP CST.  */
 (for cmp  (lt le eq ne ge gt)
  icmp (le le eq ne ge ge)
@@ -3358,7 +3384,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (with
  {
tree itype = TREE_TYPE (@0);
-   signop isign = TYPE_SIGN (itype);
format_helper fmt (REAL_MODE_FORMAT (TYPE_MODE (TREE_TYPE (@1;
const REAL_VALUE_TYPE *cst = TREE_REAL_CST_PTR (@1);
/* Be careful to preserve any potential exceptions due to
@@ -3368,17 +3393,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
bool exception_p
  = real_isnan (cst) && (cst->signalling
 || (cmp != EQ_EXPR && cmp != NE_EXPR));
-   /* INT?_MIN is power-of-two so it takes
-	  only one mantissa bit.  */
-   bool signed_p = isign == SIGNED;
-   bool itype_fits_ftype_p
-	 = TYPE_PRECISION (itype) - signed_p <= significand_size (fmt);
  }
  /* TODO: allow non-fitting itype and SNaNs when
 	-fno-trapping-math.  */
- (if (itype_fits_ftype_p && ! exception_p)
+ (if (fmt.can_represent_integral_type_p (itype) && ! exception_p)
   (with
{
+	 signop isign = TYPE_SIGN (itype);
 	 REAL_VALUE_TYPE imin, imax;
 	 real_from_integer (&imin, fmt, wi::min_value (itype), isign);
 	 real_from_integer (&imax, fmt, wi::max_value (itype), isign);
diff --git a/gcc/real.c b/gcc/real.c
index f822ae8..0cf4089 100644
--- a/gcc/real.c
+++ b/gcc/real.c
@@ -5176,6 +5176,19 @@ get_max_float (const struct real_format *fmt, char *buf, size_t len)
   gcc_assert (strlen (buf) < len);
 }
 
+/* True if all values of integral type can be represented
+   by this floating-point type exactly.  */
+
+bool format_helper::can_represent_integral_type_p (tree type) const
+{
+  gcc_assert (! decimal_p () && INTEGRAL_TYPE_P (type));
+
+  /* INT?_MIN is power-of-two so it takes
+ only one mantissa bit.  */
+  bool signed_p = TYPE_SIGN (type) == SIGNED;
+  return TYPE_PRECISION (type) - signed_p <= significand_size (*this);
+}
+
 /* True if mode M has a NaN representation and
the treatment of NaN operands is important.  */
 
diff --git a/gcc/real.h b/gcc/real.h
index 0ce4256..93db217 100644
--- a/gcc/real.h
+++ b/gcc/real.h
@@ -216,6 +216,7 @@ public:
   operator const real_format *() const { return m_format; }
 
   boo

Re: GCC options for kernel live-patching (Was: Add a new option to control inlining only on static functions)

2018-10-02 Thread Martin Liška

On 10/2/18 3:28 PM, Qing Zhao wrote:
> 
>> On Oct 2, 2018, at 3:33 AM, Martin Jambor  wrote:
>>
>> Hi,
>>
>> my apologies for being terse, I'm in a meeting.
>>
>> On Mon, Oct 01 2018, Qing Zhao wrote:
>>> Hi, Martin,
>>>
>>> I have studied a little more on
>>>
>>> https://github.com/marxin/kgraft-analysis-tool/blob/master/README.md 
>>> 
>>>
>>> in the Section “Usages”, from the example, we can see:
>>>
>>> the tool will report a list of affected functions for a function that will 
>>> be patched.
>>> In this list, it includes all callers of the patched function, and the 
>>> cloned functions from the patched function due to ipa const-propogation or 
>>> ipa sra. 
>>>
>>> My question:
>>>
>>> what’s the current action to handle the cloned functions from the
>>> patched function due to ipa const-proposation or ipa sra, etc?
>>
>> If we want to patch an inlined, cloned, or IPA-SRAed function, we also
>> patch all of its callers.
> 
> take the example from the link:
> 
> $ gcc /home/marxin/Programming/linux/aesni-intel_glue.i -O2 -fdump-ipa-clones 
> -c
> $ ./kgraft-ipa-analysis.py aesni-intel_glue.i.000i.ipa-clones
> 
> [..skipped..]
> Function: fls64/63 (./arch/x86/include/asm/bitops.h:479:90)
>   inlining to: __ilog2_u64/132 (include/linux/log2.h:40:5)
> inlining to: ablkcipher_request_alloc/1639 (include/linux/crypto.h:979:82)
>   constprop: ablkcipher_request_alloc.constprop.8/3198 
> (include/linux/crypto.h:979:82)
> inlining to: helper_rfc4106_decrypt/3007 
> (arch/x86/crypto/aesni-intel_glue.c:1016:12)
> inlining to: helper_rfc4106_encrypt/3006 
> (arch/x86/crypto/aesni-intel_glue.c:939:12)
> 
>   Affected functions: 5
> __ilog2_u64/132 (include/linux/log2.h:40:5)
> ablkcipher_request_alloc/1639 (include/linux/crypto.h:979:82)
> ablkcipher_request_alloc.constprop.8/3198 (include/linux/crypto.h:979:82)
> helper_rfc4106_decrypt/3007 (arch/x86/crypto/aesni-intel_glue.c:1016:12)
> helper_rfc4106_encrypt/3006 (arch/x86/crypto/aesni-intel_glue.c:939:12)
> [..skipped..]
> 
> 
> if we want to patch the function “fls64/63”,  what else functions we need to 
> patch, too? my guess is:

Hi.

Yes, 'Affected functions' is exactly the list of functions you want to patch.

> 
> **all the callers:
> __ilog2_u64/132
> ablkcipher_request_alloc/1639
> helper_rfc4106_decrypt/3007
> helper_rfc4106_encrypt/3006 
> **and:
> ablkcipher_request_alloc.constprop.8/3198
> is the above correct?
> 
> how to generate patch for ablkcipher_request_alloc.constprop.8/3198? since 
> it’s not a function in the source code?

Well, it's a 'static inline' function in a header file thus the function will 
be inlined in all usages.
In this particular case there's no such caller function, so you're fine.

Martin

> 
> Qing
> 
>>
>>>
>>> since those cloned functions are NOT in the source code level, how to 
>>> generate the patches for the cloned functions? how to guarantee that after 
>>> the patched function is changed, the same ipa const-propogation or ipa
>>> sra will still happened?
>>
>> You don't.
>>
>> Martin
>>
>>>
>

[patch,openacc] Use oacc_verify_routine_clauses for C/C++

2018-10-02 Thread Cesar Philippidis

This patch introduces a new oacc_verify_routine_clauses function that
reports errors if the user abuses the gang, worker and vector clauses
for acc routine directives in C/C++. Fortran is a little different,
because the FE has it's own IR. So, while it would be possible to defer
checking for gang, worker, vector parallelism until a tree node is
created for a function, we'd still have problems of verifying the
parallelism for functions and subroutines defined and declared inside
modules. The C and C++ FE's are similar enough were they can share a
common function.

Is this OK for trunk? I bootstrapped and regression tested it for x86_64
Linux with nvptx offloading. This is only touches the OpenACC code path.

Cesar
[OpenACC] Use oacc_verify_routine_clauses for C/C++

2018-XX-YY  Thomas Schwinge  
	Cesar Philippidis  

	gcc/
	* omp-general.c (oacc_build_routine_dims): Move some of its
	processing into...
	(oacc_verify_routine_clauses): ... this new function.
	* omp-general.h (oacc_verify_routine_clauses): New prototype.

	gcc/c/
	* c-parser.c (c_parser_oacc_routine): Normalize order of clauses.
	(c_finish_oacc_routine): Call oacc_verify_routine_clauses.

	gcc/cp/
	* parser.c (cp_parser_oacc_routine)
	(cp_parser_late_parsing_oacc_routine): Normalize order of clauses.
	(cp_finalize_oacc_routine): Call oacc_verify_routine_clauses.

	gcc/testsuite/
	* c-c++-common/goacc/routine-level-of-parallelism-1.c: New test.

(cherry picked from gomp-4_0-branch r239520)
---
 gcc/c/c-parser.c  |   8 +
 gcc/cp/parser.c   |   9 +
 gcc/omp-general.c |  69 -
 gcc/omp-general.h |   1 +
 .../goacc/routine-level-of-parallelism-1.c| 265 ++
 5 files changed, 342 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/routine-level-of-parallelism-1.c

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 3ca8fe71cc4..3517cb783d9 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -14999,6 +14999,9 @@ c_parser_oacc_routine (c_parser *parser, enum pragma_context context)
   data.clauses
 	= c_parser_oacc_all_clauses (parser, OACC_ROUTINE_CLAUSE_MASK,
  "#pragma acc routine");
+  /* The clauses are in reverse order; fix that to make later diagnostic
+	 emission easier.  */
+  data.clauses = nreverse (data.clauses);
 
   if (TREE_CODE (decl) != FUNCTION_DECL)
 	{
@@ -15013,6 +15016,9 @@ c_parser_oacc_routine (c_parser *parser, enum pragma_context context)
   data.clauses
 	= c_parser_oacc_all_clauses (parser, OACC_ROUTINE_CLAUSE_MASK,
  "#pragma acc routine");
+  /* The clauses are in reverse order; fix that to make later diagnostic
+	 emission easier.  */
+  data.clauses = nreverse (data.clauses);
 
   /* Emit a helpful diagnostic if there's another pragma following this
 	 one.  Also don't allow a static assertion declaration, as in the
@@ -15076,6 +15082,8 @@ c_finish_oacc_routine (struct oacc_routine_data *data, tree fndecl,
   return;
 }
 
+  oacc_verify_routine_clauses (&data->clauses, data->loc);
+
   if (oacc_get_fn_attrib (fndecl))
 {
   error_at (data->loc,
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 241226d8c21..fa7ee7798ae 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -38117,6 +38117,9 @@ cp_parser_oacc_routine (cp_parser *parser, cp_token *pragma_tok,
 	= cp_parser_oacc_all_clauses (parser, OACC_ROUTINE_CLAUSE_MASK,
   "#pragma acc routine",
   cp_lexer_peek_token (parser->lexer));
+  /* The clauses are in reverse order; fix that to make later diagnostic
+	 emission easier.  */
+  data.clauses = nreverse (data.clauses);
 
   if (decl && is_overloaded_fn (decl)
 	  && (TREE_CODE (decl) != FUNCTION_DECL
@@ -38213,6 +38216,9 @@ cp_parser_late_parsing_oacc_routine (cp_parser *parser, tree attrs)
   parser->oacc_routine->clauses
 = cp_parser_oacc_all_clauses (parser, OACC_ROUTINE_CLAUSE_MASK,
   "#pragma acc routine", pragma_tok);
+  /* The clauses are in reverse order; fix that to make later diagnostic
+ emission easier.  */
+  parser->oacc_routine->clauses = nreverse (parser->oacc_routine->clauses);
   cp_parser_pop_lexer (parser);
   /* Later, cp_finalize_oacc_routine will process the clauses, and then set
  fndecl_seen.  */
@@ -38247,6 +38253,9 @@ cp_finalize_oacc_routine (cp_parser *parser, tree fndecl, bool is_defn)
 	  return;
 	}
 
+  oacc_verify_routine_clauses (&parser->oacc_routine->clauses,
+   parser->oacc_routine->loc);
+
   if (oacc_get_fn_attrib (fndecl))
 	{
 	  error_at (parser->oacc_routine->loc,
diff --git a/gcc/omp-general.c b/gcc/omp-general.c
index cac6de2..3ea2224957d 100644
--- a/gcc/omp-general.c
+++ b/gcc/omp-general.c
@@ -559,9 +559,64 @@ oacc_set_fn_attrib (tree fn, tree clauses, vec *args)
 }
 }
 
-/*  Process the routine's dimension clauess to generate an attribute
-value.  Issue diagn

[patch,openacc] Add support for OpenACC routine nohost clause

2018-10-02 Thread Cesar Philippidis

Attached is a patch that introduces support for the acc routine nohost
clause. Basically, if an acc routine function is marked as nohost, then
the compiler does not generate code for the host. It's kind of strange
to test for. Basically, we had to use acc_on_device at -O2 so that the
host references to the dead function get optimized away.

I believe that the nohost clause was added for acc routines to allow
offloaded acc code to call vendor libraries, such as cuBLAS, which are
only available for specific accelerators. I haven't seen it used much in
practice though.

Is this OK for trunk? I bootstrapped and regtested it for x86_64 Linux
with nvptx offloading.

Thanks
Cesar
[OpenACC] Add support for OpenACC routine nohost clause

(was OpenACC bind, nohost changes)

2018-XX-YY  Thomas Schwinge  
	Cesar Philippidis  

	gcc/
	* tree-core.h (omp_clause_code): Add OMP_CLAUSE_NOHOST.
	* tree.c (omp_clause_num_ops, omp_clause_code_name, walk_tree_1):
	Update for these.
	* tree-pretty-print.c (dump_omp_clause): Handle	OMP_CLAUSE_NOHOST.
	* gimplify.c (gimplify_scan_omp_clauses)
	(gimplify_adjust_omp_clauses): Handle OMP_CLAUSE_NOHOST.
	* tree-nested.c (convert_nonlocal_omp_clauses)
	(convert_local_omp_clauses): Likewise.
	* omp-low.c (scan_sharing_clauses): Likewise.
	* omp-offload.c (maybe_discard_oacc_function): New function.
	(execute_oacc_device_lower) [!ACCEL_COMPILER]: Handle OpenACC
	nohost clauses.

	gcc/c-family/
	* c-attribs.c (c_common_attribute_table): Set min_len to -1 for
	"omp declare target".
	* c-pragma.h (pragma_omp_clause): Add PRAGMA_OACC_CLAUSE_NOHST.

	gcc/c/
	* c-parser.c (c_parser_omp_clause_name): Handle "nohost".
	(c_parser_oacc_all_clauses): Handle PRAGMA_OACC_CLAUSE_NOHOST.
	(c_parser_oacc_routine, c_finish_oacc_routine): Update.
	* c-typeck.c (c_finish_omp_clauses): Handle OMP_CLAUSE_NOHOST.

	gcc/cp/
	* parser.c (cp_parser_omp_clause_name): Handle "nohost".
	(cp_parser_oacc_all_clauses): Handle PRAGMA_OACC_CLAUSE_NOHOST,
	(cp_parser_oacc_routine, cp_finalize_oacc_routine): Update.
	* pt.c (tsubst_omp_clauses): Handle OMP_CLAUSE_NOHOST.
	* semantics.c (finish_omp_clauses): Handle OMP_CLAUSE_NOHOST.

	gcc/fortran/
	* gfortran.h (gfc_omp_clauses): Add nohost members.
	* openmp.c (omp_mask2): Add OMP_CLAUSE_NOHOST.
	(gfc_match_omp_clauses): Handle OMP_CLAUSE_NOHOST.
	(gfc_match_oacc_routine): Set oacc_function_nohost when appropriate.
	* gfortran.h (symbol_attribute): Add oacc_function_nohost member.
	* trans-openmp.c (gfc_add_omp_offload_attributes): Use it to decide
	whether to generate an OMP_CLAUSE_NOHOST clause.
	(gfc_trans_omp_clauses_1): Unreachable code to generate an
	OMP_CLAUSE_NOHOST clause.

	gcc/testsuite/
	* c-c++-common/goacc/classify-routine.c: Adjust test.
	* c-c++-common/goacc/routine-1.c: Likewise.
	* c-c++-common/goacc/routine-2.c: Likewise.
	* c-c++-common/goacc/routine-nohost-1.c: New test.
	* g++.dg/goacc/routine-2.C: Adjust test.
	* gfortran.dg/goacc/pr72741.f90: New test.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/routine-3.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/routine-bind-nohost-1.c:
	Update test.
	* testsuite/libgomp.oacc-fortran/routine-8.f90: Likewise.

(cherry picked from gomp-4_0-branch r223007, r226192, r226259, r228915,
r228916, and r231423)
(cherry picked from gomp-4_0-branch r231973 and r231979)
(cherry picked from gomp-4_0-branch r238847)
---
 gcc/c-family/c-attribs.c  |  2 +-
 gcc/c-family/c-pragma.h   |  1 +
 gcc/c/c-parser.c  | 12 +-
 gcc/c/c-typeck.c  |  1 +
 gcc/cp/parser.c   | 13 +--
 gcc/cp/pt.c   |  1 +
 gcc/cp/semantics.c|  1 +
 gcc/fortran/gfortran.h|  3 +-
 gcc/fortran/openmp.c  | 29 +++---
 gcc/fortran/trans-openmp.c| 15 +++-
 gcc/gimplify.c|  2 +
 gcc/lto/lto.c |  1 +
 gcc/omp-low.c |  2 +
 gcc/omp-offload.c | 38 ---
 .../c-c++-common/goacc/classify-routine.c |  4 +-
 gcc/testsuite/c-c++-common/goacc/routine-1.c  |  8 
 gcc/testsuite/c-c++-common/goacc/routine-2.c  |  8 ++--
 .../c-c++-common/goacc/routine-nohost-1.c | 28 ++
 gcc/testsuite/g++.dg/goacc/routine-2.C|  9 +
 gcc/testsuite/gfortran.dg/goacc/pr72741.f90   | 30 +++
 gcc/tree-core.h   |  3 ++
 gcc/tree-nested.c |  4 ++
 gcc/tree-pretty-print.c   |  3 ++
 gcc/tree.c|  3 ++
 .../libgomp.oacc-c-c++-common/routine-3.c | 33 
 .../routine-nohost-1.c| 18 +
 .../libgomp.oacc-fortran/ro

Re: [committed] Use structure to bubble up information about unterminated strings from c_strlen

2018-10-02 Thread Jeff Law

On 10/1/18 3:46 PM, Christophe Lyon wrote:
> On Sat, 29 Sep 2018 at 18:06, Jeff Law  wrote:
>>
>>
>> This patch changes the NONSTR argument to c_strlen to instead be a
>> little data structure c_strlen can populate with nuggets of information
>> about the string.
>>
>> There's clearly a need for the decl related to the non-string argument.
>> I see an immediate need for the length of a non-terminated string
>> (c_strlen returns NULL for non-terminated strings).  I also see a need
>> for the offset within the non-terminated strong as well.
>>
>> We only populate the structure when c_strlen encounters a non-terminated
>> string.  One could argue we should always fill in the members.  Right
>> now I think filling it in for unterminated cases makes the most sense,
>> but I could be convinced otherwise.
>>
>> I won't be surprised if subsequent warnings from Martin need additional
>> information about the string.  The idea here is we can add more elements
>> to the structure without continually adding arguments to c_strlen.
>>
>> Bootstrapped in isolation as well as with Martin's patches for strnlen
>> and sprintf checking.  Installing on the trunk.
>>
> 
> Hi Jeff,
> 
> + /* If TYPE is asking for a maximum, then use any
> +length (including the length of an unterminated
> +string) for VAL.  */
> + if (type == 2)
> +   val = data.len;
> 
> It seems this part is dead-code, since the case type==2 is handled in
> the "then" part of the "if" (this code is in the "else" part).
> 
> Since you added a comment, I suspect you explicitly tested it, though?
So it's not needed by patch #6 (strnlen), but patch #4 (not yet
committed) does need some work in this area -- but even so I'm not sure
what that's going to look like yet.

So I've pulled the dead code out and pushed that commit to the trunk
after the usual bootstrap and regression test cycle.

Jeff

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 40f87aaeae5..59a01f86436 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2018-10-02  Jeff Law  
+
+   * gimple-fold.c (get_range_strlen): Remove dead code.
+
 2018-10-02  Martin Sebor  
Jeff Law  
 
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index cf04c92180b..fa1fc60876c 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -1345,14 +1345,7 @@ get_range_strlen (tree arg, tree length[2], bitmap 
*visited, int type,
  /* If we potentially had a non-terminated string, then
 bubble that information up to the caller.  */
  if (!val)
-   {
- *nonstr = data.decl;
- /* If TYPE is asking for a maximum, then use any
-length (including the length of an unterminated
-string) for VAL.  */
- if (type == 2)
-   val = data.len;
-   }
+   *nonstr = data.decl;
}
 
   if (!val && fuzzy)

[PATCH, rs6000] 2/2 Add x86 SSE3 intrinsics to GCC PPC64LE target

2018-10-02 Thread Paul Clarke

This is part 2/2 for contributing PPC64LE support for X86 SSE3
instrisics. This patch includes testsuite/gcc.target tests for the
intrinsics defined in pmmintrin.h. 

Tested on POWER8 ppc64le and ppc64 (-m64 and -m32, the latter only reporting
10 new unsupported tests.)

[gcc/testsuite]

2018-10-01  Paul A. Clarke  

* sse3-check.h: New file.
* sse3-addsubps.h: New file.
* sse3-addsubpd.h: New file.
* sse3-haddps.h: New file.
* sse3-hsubps.h: New file.
* sse3-haddpd.h: New file.
* sse3-hsubpd.h: New file.
* sse3-lddqu.h: New file.
* sse3-movsldup.h: New file.
* sse3-movshdup.h: New file.
* sse3-movddup.h: New file.

Index: gcc/testsuite/gcc.target/powerpc/pr37191.c
===
--- gcc/testsuite/gcc.target/powerpc/pr37191.c  (nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr37191.c  (working copy)
@@ -0,0 +1,49 @@
+/* { dg-do compile } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-options "-O3 -mdirect-move" } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target p8vector_hw } */
+
+#define NO_WARN_X86_INTRINSICS 1
+
+#include 
+#include 
+#include 
+
+//extern
+const uint64_t ff_bone;
+
+static inline void transpose4x4(uint8_t *dst, uint8_t *src, ptrdiff_t 
dst_stride, ptrdiff_t src_stride) {
+  __m64 row0 = _mm_cvtsi32_si64(*(unsigned*)(src + (0 * src_stride)));
+  __m64 row1 = _mm_cvtsi32_si64(*(unsigned*)(src + (1 * src_stride)));
+  __m64 row2 = _mm_cvtsi32_si64(*(unsigned*)(src + (2 * src_stride)));
+  __m64 row3 = _mm_cvtsi32_si64(*(unsigned*)(src + (3 * src_stride)));
+  __m64 tmp0 = _mm_unpacklo_pi8(row0, row1);
+  __m64 tmp1 = _mm_unpacklo_pi8(row2, row3);
+  __m64 row01 = _mm_unpacklo_pi16(tmp0, tmp1);
+  __m64 row23 = _mm_unpackhi_pi16(tmp0, tmp1);
+  *((unsigned*)(dst + (0 * dst_stride))) = _mm_cvtsi64_si32(row01);
+  *((unsigned*)(dst + (1 * dst_stride))) = 
_mm_cvtsi64_si32(_mm_unpackhi_pi32(row01, row01));
+  *((unsigned*)(dst + (2 * dst_stride))) = _mm_cvtsi64_si32(row23);
+  *((unsigned*)(dst + (3 * dst_stride))) = 
_mm_cvtsi64_si32(_mm_unpackhi_pi32(row23, row23));
+}
+#if 0
+static inline void h264_loop_filter_chroma_intra_mmx2(uint8_t *pix, int 
stride, int alpha1, int beta1)
+{
+asm volatile(
+""
+:: "r"(pix-2*stride), "r"(pix), "r"((long)stride),
+   "m"(alpha1), "m"(beta1), "m"(ff_bone)
+);
+}
+
+#endif
+void h264_h_loop_filter_chroma_intra_mmx2(uint8_t *pix, int stride, int alpha, 
int beta)
+{
+  uint8_t trans[8*4] __attribute__ ((aligned (8)));
+  transpose4x4(trans, pix-2, 8, stride);
+  transpose4x4(trans+4, pix-2+4*stride, 8, stride);
+//h264_loop_filter_chroma_intra_mmx2(trans+2*8, 8, alpha-1, beta-1);
+  transpose4x4(pix-2, trans, stride, 8);
+  transpose4x4(pix-2+4*stride, trans+4, stride, 8);
+}
Index: gcc/testsuite/gcc.target/powerpc/sse3-addsubpd.c
===
--- gcc/testsuite/gcc.target/powerpc/sse3-addsubpd.c(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/sse3-addsubpd.c(working copy)
@@ -0,0 +1,102 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target p8vector_hw } */
+
+#ifndef CHECK_H
+#define CHECK_H "sse3-check.h"
+#endif
+
+#include CHECK_H
+
+#ifndef TEST
+#define TEST sse3_test_addsubpd_1
+#endif
+
+#define NO_WARN_X86_INTRINSICS 1
+#include 
+
+static void
+sse3_test_addsubpd (double *i1, double *i2, double *r)
+{
+  __m128d t1 = _mm_loadu_pd (i1);
+  __m128d t2 = _mm_loadu_pd (i2);
+
+  t1 = _mm_addsub_pd (t1, t2);
+
+  _mm_storeu_pd (r, t1);
+}
+
+static void
+sse3_test_addsubpd_subsume (double *i1, double *i2, double *r)
+{
+  __m128d t1 = _mm_load_pd (i1);
+  __m128d t2 = _mm_load_pd (i2);
+
+  t1 = _mm_addsub_pd (t1, t2);
+
+  _mm_storeu_pd (r, t1);
+}
+
+static int
+chk_pd (double *v1, double *v2)
+{
+  int i;
+  int n_fails = 0;
+
+  for (i = 0; i < 2; i++)
+if (v1[i] != v2[i])
+  n_fails += 1;
+
+  return n_fails;
+}
+
+static double p1[2] __attribute__ ((aligned(16)));
+static double p2[2] __attribute__ ((aligned(16)));
+static double p3[2];
+static double ck[2];
+
+double vals[] =
+  {
+100.0,  200.0, 300.0, 400.0, 5.0, -1.0, .345, -21.5,
+1100.0, 0.235, 321.3, 53.40, 0.3, 10.0, 42.0, 32.52,
+32.6,   123.3, 1.234, 2.156, 0.1, 3.25, 4.75, 32.44,
+12.16,  52.34, 64.12, 71.13, -.1, 2.30, 5.12, 3.785,
+541.3,  321.4, 231.4, 531.4, 71., 321., 231., -531.,
+23.45,  23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45,
+23.45,  -1.43, -6.74, 6.345, -20.1, -20.1, -40.1, -40.1,
+1.234,  2.345, 3.456, 4.567, 5.678, 6.789, 7.891, 8.912,
+-9.32,  -8.41, -7.50, -6.59, -5.68, -4.77, -3.86, -2.95,
+9.32,  8.41, 7.50, 6.59, -5.68, -4.77, -3.86, -2.95
+  };
+
+static
+void
+TEST (void)
+{
+  int i;
+  int fail = 0;
+
+  for

[PATCH, rs6000] 1/2 Add x86 SSE3 intrinsics to GCC PPC64LE target

2018-10-02 Thread Paul Clarke

This is a follow-on to earlier commits for adding compatibility
implementations of x86 intrinsics for PPC64LE.  This is the first of
two patches.  This patch adds 11 of the 13 x86 intrinsics from
 ("SSE3").  (Patch 2/2 adds tests for these intrinsics,
and briefly describes the tests performed.)

Implementations are relatively straightforward, with occasional
extra effort for vector element ordering.

Not implemented are _mm_wait and _mm_monitor, as there are no
direct or trivial analogs in the POWER ISA.

./gcc/ChangeLog:

2018-10-02  Paul A. Clarke  

* config.gcc (powerpc*-*-*): Add pmmintrin.h.
* config/rs6000/pmmintrin.h: New file.

Index: gcc/config/rs6000/pmmintrin.h
===
--- gcc/config/rs6000/pmmintrin.h   (nonexistent)
+++ gcc/config/rs6000/pmmintrin.h   (working copy)
@@ -0,0 +1,177 @@
+/* Copyright (C) 2003-2018 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+/* Implemented from the specification included in the Intel C++ Compiler
+   User Guide and Reference, version 9.0.  */
+
+#ifndef NO_WARN_X86_INTRINSICS
+/* This header is distributed to simplify porting x86_64 code that
+   makes explicit use of Intel intrinsics to powerpc64le.
+   It is the user's responsibility to determine if the results are
+   acceptable and make additional changes as necessary.
+   Note that much code that uses Intel intrinsics can be rewritten in
+   standard C or GNU C extensions, which are more portable and better
+   optimized across multiple targets.
+
+   In the specific case of X86 SSE3 intrinsics, the PowerPC VMX/VSX ISA
+   is a good match for most SIMD operations.  However the Horizontal
+   add/sub requires the data pairs be permuted into a separate
+   registers with vertical even/odd alignment for the operation.
+   And the addsub operation requires the sign of only the even numbered
+   elements be flipped (xored with -0.0).
+   For larger blocks of code using these intrinsic implementations,
+   the compiler be should be able to schedule instructions to avoid
+   additional latency.
+
+   In the specific case of the monitor and mwait instructions there are
+   no direct equivalent in the PowerISA at this time.  So those
+   intrinsics are not implemented.  */
+#error "Please read comment above.  Use -DNO_WARN_X86_INTRINSICS to disable 
this warning."
+#endif
+
+#ifndef _PMMINTRIN_H_INCLUDED
+#define _PMMINTRIN_H_INCLUDED
+
+/* We need definitions from the SSE2 and SSE header files*/
+#include 
+
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_addsub_ps (__m128 __X, __m128 __Y)
+{
+  const __v4sf even_n0 = {-0.0, 0.0, -0.0, 0.0};
+  __v4sf even_neg_Y = vec_xor(__Y, even_n0);
+  return (__m128) vec_add (__X, even_neg_Y);
+}
+
+extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_addsub_pd (__m128d __X, __m128d __Y)
+{
+  const __v2df even_n0 = {-0.0, 0.0};
+  __v2df even_neg_Y = vec_xor(__Y, even_n0);
+  return (__m128d) vec_add (__X, even_neg_Y);
+}
+
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_hadd_ps (__m128 __X, __m128 __Y)
+{
+  __vector unsigned char xform2 = {
+#ifdef __LITTLE_ENDIAN__
+  0x00, 0x01, 0x02, 0x03,  0x08, 0x09, 0x0A, 0x0B,  0x10, 0x11, 0x12, 
0x13,  0x18, 0x19, 0x1A, 0x1B
+#elif __BIG_ENDIAN__
+  0x14, 0x15, 0x16, 0x17,  0x1C, 0x1D, 0x1E, 0x1F,  0x04, 0x05, 0x06, 
0x07,  0x0C, 0x0D, 0x0E, 0x0F
+#endif
+};
+  __vector unsigned char xform1 = {
+#ifdef __LITTLE_ENDIAN__
+  0x04, 0x05, 0x06, 0x07,  0x0C, 0x0D, 0x0E, 0x0F,  0x14, 0x15, 0x16, 
0x17,  0x1C, 0x1D, 0x1E, 0x1F
+#elif __BIG_ENDIAN__
+  0x10, 0x11, 0x12, 0x13,  0x18, 0x19, 0x1A, 0x1B,  0x00, 0x01, 0x02, 
0x03,  0x08, 0x09, 0x0A, 0x0B
+#endif
+};
+  return (__m128) vec_add (vec_perm ((__v4sf) __X, (__v4sf) __Y, xform2),
+  vec_perm ((__v4sf) __X, (__v4sf) __Y, xform1));
+}
+
+extern __inline __m12

[PATCH] rs6000: Fix vec-init-6.c (PR87081)

2018-10-02 Thread Segher Boessenkool

Since a while we use a rldimi instead of rldicl/rldicr/or to combine
two words to one.


2018-10-02  Segher Boessenkool  

PR target/87081
* gcc.target/powerpc/vec-init-6.c: Fix expected asm.

---
 gcc/testsuite/gcc.target/powerpc/vec-init-6.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/vec-init-6.c 
b/gcc/testsuite/gcc.target/powerpc/vec-init-6.c
index 8d610e1..f574da3 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-init-6.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-init-6.c
@@ -9,8 +9,7 @@ merge (int a, int b, int c, int d)
   return (vector int) { a, b, c, d };
 }
 
-/* { dg-final { scan-assembler "rldicr" } } */
-/* { dg-final { scan-assembler "rldicl" } } */
+/* { dg-final { scan-assembler-times {\mrldi} 2 } } */
 /* { dg-final { scan-assembler "mtvsrd" } } */
 /* { dg-final { scan-assembler-not "stw"} } */
 /* { dg-final { scan-assembler-not "lxvw4x" } } */
-- 
1.8.3.1

Re: [PATCH c++/pr87295] -fdebug-types-section ICE

2018-10-02 Thread Nathan Sidwell


Ping?

On 9/17/18 7:30 AM, Nathan Sidwell wrote:

Richard,
this patch makes the ICE go away, but I really don't know if it's 
correct.  When cloning the type die I copy die_id, so it is found during 
the (currently ICEing) hash lookup.  In this particular testcase we 
clone the die twice.  Once from break_out_comdat_types and once from 
copy_decls_for_unworthy_types. That seems a little strange (but I have 
only the smallest understanding of debug data) As these tracebacks show:


Breakpoint 5, clone_as_declaration (
     die=>)

     at ../../../src/gcc/dwarf2out.c:8179
8179  clone->die_id = die->die_id;
(gdb) back
#0  clone_as_declaration (die=DW_TAG_structure_type >)

     at ../../../src/gcc/dwarf2out.c:8179
#1  0x00e34faf in generate_skeleton_ancestor_tree 
(node=0x7fffe230) at ../../../src/gcc/dwarf2out.c:8345
#2  0x00e351c3 in generate_skeleton_bottom_up 
(parent=0x7fffe230) at ../../../src/gcc/dwarf2out.c:8419

#3  0x00e35291 in generate_skeleton (
     die=>)

     at ../../../src/gcc/dwarf2out.c:8449
#4  0x00e352ce in remove_child_or_replace_with_skeleton 
(unit=,
     child=>,
     prev=>)

     at ../../../src/gcc/dwarf2out.c:8471
#5  0x00e3570a in break_out_comdat_types (die=0x773260a0 DW_TAG_compile_unit>)

     at ../../../src/gcc/dwarf2out.c:8636
#6  0x00e75153 in dwarf2out_early_finish (filename=0x34de9d0 
"bug.ii") at ../../../src/gcc/dwarf2out.c:32034
#7  0x00daa210 in symbol_table::finalize_compilation_unit 
(this=0x7730d100) at ../../../src/gcc/cgraphunit.c:2783

#8  0x01417ec2 in compile_file () at ../../../src/gcc/toplev.c:480
#9  0x0141a90c in do_compile () at ../../../src/gcc/toplev.c:2170
#10 0x0141abf8 in toplev::main (this=0x7fffe56e, argc=8, 
argv=0x7fffe668) at ../../../src/gcc/toplev.c:2305
#11 0x0227731e in main (argc=8, argv=0x7fffe668) at 
../../../src/gcc/main.c:39

(gdb) p clone
$2 = 
(gdb) c
Continuing.

Breakpoint 5, clone_as_declaration (die=DW_TAG_structure_type >)

     at ../../../src/gcc/dwarf2out.c:8179
8179  clone->die_id = die->die_id;
(gdb) p clone
$3 = 
(gdb) back
#0  clone_as_declaration (die=DW_TAG_structure_type >)

     at ../../../src/gcc/dwarf2out.c:8179
#1  0x00e34dac in copy_ancestor_tree (unit=0x773260a0 DW_TAG_compile_unit>,
     die=>, decl_table=0x7fffe2f0)

     at ../../../src/gcc/dwarf2out.c:8267
#2  0x00e35adb in copy_decls_walk (unit=0x773260a0 DW_TAG_compile_unit>,
     die=>, decl_table=0x7fffe2f0)

     at ../../../src/gcc/dwarf2out.c:8764
#3  0x00e35ba6 in copy_decls_walk (unit=0x773260a0 DW_TAG_compile_unit>,
     die=>, decl_table=0x7fffe2f0)

     at ../../../src/gcc/dwarf2out.c:8790
#4  0x00e35ba6 in copy_decls_walk (unit=0x773260a0 DW_TAG_compile_unit>,
     die=, 
decl_table=0x7fffe2f0) at ../../../src/gcc/dwarf2out.c:8790
#5  0x00e35c09 in copy_decls_for_unworthy_types 
(unit=)

     at ../../../src/gcc/dwarf2out.c:8804
#6  0x00e7519a in dwarf2out_early_finish (filename=0x34de9d0 
"bug.ii") at ../../../src/gcc/dwarf2out.c:32047
#7  0x00daa210 in symbol_table::finalize_compilation_unit 
(this=0x7730d100) at ../../../src/gcc/cgraphunit.c:2783

#8  0x01417ec2 in compile_file () at ../../../src/gcc/toplev.c:480
#9  0x0141a90c in do_compile () at ../../../src/gcc/toplev.c:2170
#10 0x0141abf8 in toplev::main (this=0x7fffe56e, argc=8, 
argv=0x7fffe668) at ../../../src/gcc/toplev.c:2305
#11 0x0227731e in main (argc=8, argv=0x7fffe668) at 
../../../src/gcc/main.c:39

(gdb)





--
Nathan Sidwell
2018-09-17  Nathan Sidwell  

	PR c++/87295
	* dwarf2out.c (clone_as_declaration): Copy die_id and set
	comdat_type_p as appropriate.

Index: dwarf2out.c
===
--- dwarf2out.c	(revision 264332)
+++ dwarf2out.c	(working copy)
@@ -8176,8 +8176,12 @@ clone_as_declaration (dw_die_ref die)
 }
 }
 
+  clone->die_id = die->die_id;
   if (die->comdat_type_p)
-add_AT_die_ref (clone, DW_AT_signature, die);
+{
+  clone->comdat_type_p = true;
+  add_AT_die_ref (clone, DW_AT_signature, die);
+}
 
   add_AT_flag (clone, DW_AT_declaration, 1);
   return clone;
Index: testsuite/g++.dg/debug/pr87295.C
===
--- testsuite/g++.dg/debug/pr87295.C	(revision 0)
+++ testsuite/g++.dg/debug/pr87295.C	(working copy)
@@ -0,0 +1,20 @@
+// PR c++/87295 ICE in dwarf2out
+// { dg-options "-flto -ffat-lto-objects -fdebug-types-section -g -std=gnu++17" }
+
+template
+struct integral_constant
+{
+  static constexpr _Tp value = __v;
+  typedef _Tp value_type;
+  constexpr operator value_type() const noexcept { return value; }
+};
+
+typedef integral_constant false_type;
+
+template
+struct __or_;
+
+template<>
+struct __or_<>
+  : public false_type
+{ };

Re: [PATCH 0/2][IRA,LRA] Fix PR86939, IRA incorrectly creates an interference between a pseudo register and a hard register

2018-10-02 Thread Jeff Law

On 10/1/18 9:52 PM, Peter Bergner wrote:
> On 10/1/18 7:45 AM, H.J. Lu wrote:
>> You may have undone:
>>
>> https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=218059
> 
> Yes, the code above also needed to be modified to handle conflicts being
> added at definitions rather than at uses.  The patch below does that.
> I don't really have access to a i686 (ie, 32-bit) system to test on and
> I'm not sure how to force the test to be run in 32-bit mode on a 64-bit
> build, but it does fix the assembler for the pr63534.c test case.
> 
> That said, looking at the rtl for the test case, I see the following
> before RA:
> 
> (insn 5 2 6 2 (set (reg:SI 3 bx)
> (reg:SI 82)) "pr63534.c":10 85 {*movsi_internal}
>  (nil))
> (call_insn 6 5 7 2 (call (mem:QI (symbol_ref:SI ("bar") [flags 0x41]  
> ) [0 barD.1498 S1 A8])
> (const_int 0 [0])) "pr63534.c":10 687 {*call}
>  (expr_list:REG_DEAD (reg:SI 3 bx)
> (expr_list:REG_CALL_DECL (symbol_ref:SI ("bar") [flags 0x41]  
> )
> (nil)))
> (expr_list (use (reg:SI 3 bx))
> (nil)))
> (insn 7 6 8 2 (set (reg:SI 3 bx)
> (reg:SI 82)) "pr63534.c":11 85 {*movsi_internal}
>  (expr_list:REG_DEAD (reg:SI 82)
> (nil)))
> (call_insn 8 7 0 2 (call (mem:QI (symbol_ref:SI ("bar") [flags 0x41]  
> ) [0 barD.1498 S1 A8])
> (const_int 0 [0])) "pr63534.c":11 687 {*call}
>  (expr_list:REG_DEAD (reg:SI 3 bx)
> (expr_list:REG_CALL_DECL (symbol_ref:SI ("bar") [flags 0x41]  
> )
> (nil)))
> (expr_list (use (reg:SI 3 bx))
> (nil)))
> 
> Now that we handle conflicts at definitions and the pic hard reg
> is set via a copy from the pic pseudo, my PATCH 2 is setup to
> handle exactly this scenario (ie, a copy between a pseudo and
> a hard reg).  I looked at the asm output from a build with both
> PATCH 1 and PATCH 2, and yes, it also does not add the conflict
> between the pic pseudo and pic hard reg, so our other option to
> fix PR87479 is to apply PATCH 2.  However, since PATCH 2 handles
> the pic pseudo and pic hard reg conflict itself, that means we
> don't need the special pic conflict code and it can be removed!
> I'm going to update PATCH 2 to remove that pic handling code
> and send it through bootstrap and regtesting.
> 
> H.J., can you confirm that the following patch not only fixes
> the bug you opened, but also doesn't introduce any more?
> Once I've updated PATCH 2, I'd like you to test/bless that
> one as well.  Thanks.
Haven't looked at the patch yet.  The easiest (but not fastest) way to
build i686 native is gcc45 in the build farm.

Jeff

Re: GCC options for kernel live-patching (Was: Add a new option to control inlining only on static functions)

2018-10-02 Thread Qing Zhao



> On Oct 2, 2018, at 9:02 AM, Martin Liška  wrote:
> 
> On 10/2/18 3:28 PM, Qing Zhao wrote:
>> 
>>> On Oct 2, 2018, at 3:33 AM, Martin Jambor  wrote:
>>> 
>>> Hi,
>>> 
>>> my apologies for being terse, I'm in a meeting.
>>> 
>>> On Mon, Oct 01 2018, Qing Zhao wrote:
 Hi, Martin,
 
 I have studied a little more on
 
 https://github.com/marxin/kgraft-analysis-tool/blob/master/README.md 
 
 
 in the Section “Usages”, from the example, we can see:
 
 the tool will report a list of affected functions for a function that will 
 be patched.
 In this list, it includes all callers of the patched function, and the 
 cloned functions from the patched function due to ipa const-propogation or 
 ipa sra. 
 
 My question:
 
 what’s the current action to handle the cloned functions from the
 patched function due to ipa const-proposation or ipa sra, etc?
>>> 
>>> If we want to patch an inlined, cloned, or IPA-SRAed function, we also
>>> patch all of its callers.
>> 
>> take the example from the link:
>> 
>> $ gcc /home/marxin/Programming/linux/aesni-intel_glue.i -O2 
>> -fdump-ipa-clones -c
>> $ ./kgraft-ipa-analysis.py aesni-intel_glue.i.000i.ipa-clones
>> 
>> [..skipped..]
>> Function: fls64/63 (./arch/x86/include/asm/bitops.h:479:90)
>>  inlining to: __ilog2_u64/132 (include/linux/log2.h:40:5)
>>inlining to: ablkcipher_request_alloc/1639 (include/linux/crypto.h:979:82)
>>  constprop: ablkcipher_request_alloc.constprop.8/3198 
>> (include/linux/crypto.h:979:82)
>>inlining to: helper_rfc4106_decrypt/3007 
>> (arch/x86/crypto/aesni-intel_glue.c:1016:12)
>>inlining to: helper_rfc4106_encrypt/3006 
>> (arch/x86/crypto/aesni-intel_glue.c:939:12)
>> 
>>  Affected functions: 5
>>__ilog2_u64/132 (include/linux/log2.h:40:5)
>>ablkcipher_request_alloc/1639 (include/linux/crypto.h:979:82)
>>ablkcipher_request_alloc.constprop.8/3198 (include/linux/crypto.h:979:82)
>>helper_rfc4106_decrypt/3007 (arch/x86/crypto/aesni-intel_glue.c:1016:12)
>>helper_rfc4106_encrypt/3006 (arch/x86/crypto/aesni-intel_glue.c:939:12)
>> [..skipped..]
>> 
>> 
>> if we want to patch the function “fls64/63”,  what else functions we need to 
>> patch, too? my guess is:
> 
> Hi.
> 
> Yes, 'Affected functions' is exactly the list of functions you want to patch.
> 
>> 
>> **all the callers:
>> __ilog2_u64/132
>> ablkcipher_request_alloc/1639
>> helper_rfc4106_decrypt/3007
>> helper_rfc4106_encrypt/3006 
>> **and:
>> ablkcipher_request_alloc.constprop.8/3198
>> is the above correct?
>> 
>> how to generate patch for ablkcipher_request_alloc.constprop.8/3198? since 
>> it’s not a function in the source code?
> 
> Well, it's a 'static inline' function in a header file thus the function will 
> be inlined in all usages.
> In this particular case there's no such caller function, so you're fine.

So, for cloned functions, you have to analyze them case by case manually to see 
their callers?
why not just disable ipa-cp or ipa-sra completely?

Qing
>

[patch,openacc] Repeated use of the OpenACC routine directive

2018-10-02 Thread Cesar Philippidis

This is another patch that teaches the C and C++ to emit more errors
involving acc routine clauses. In retrospect, I should have merged it
together with the patch I posted here
, however at
the time I thought it would make the patch too large.

Is this patch OK for trunk? I bootstrapped and regtested it for x86_64
Linux with nvptx offloading. This patch is also self-contained to the
OpenACC code path.

Thanks,
Cesar
[OpenACC] Repeated use of the OpenACC routine directive

2018-XX-YY  Thomas Schwinge  
	Cesar Philippidis  

	gcc/
	* omp-general.h (oacc_verify_routine_clauses): Declare.
	* omp-general.c (oacc_verify_routine_clauses): Change formal
	parameters.  Add checking if already marked as an accelerator
	routine.  Adjust all users.

	gcc/c/
	* c-parser.c (c_finish_oacc_routine): Rework checking if already
	marked as an accelerator routine.

	gcc/cp/
	* parser.c (cp_finalize_oacc_routine): Rework checking if already
	marked as an accelerator routine.

	gcc/testsuite/
	* c-c++-common/goacc/routine-1.c: Update tests.
	* c-c++-common/goacc/routine-5.c: Likewise.
	* c-c++-common/goacc/routine-level-of-parallelism-1.c: Likewise.
	* c-c++-common/goacc/routine-level-of-parallelism-2.c: New test.
	* c-c++-common/goacc/routine-nohost-1.c: Update tests.
	* c-c++-common/goacc/routine-nohost-2.c: New test.


(cherry picked from gomp-4_0-branch r239521)

remove bind clause support
---
 gcc/c/c-parser.c  |  46 ++--
 gcc/cp/parser.c   |  50 ++--
 gcc/omp-general.c | 105 +++-
 gcc/omp-general.h |   3 +-
 gcc/testsuite/c-c++-common/goacc/routine-1.c  |  10 +-
 gcc/testsuite/c-c++-common/goacc/routine-5.c  |   4 +-
 .../goacc/routine-level-of-parallelism-1.c| 233 --
 .../goacc/routine-level-of-parallelism-2.c|  73 ++
 .../c-c++-common/goacc/routine-nohost-1.c |  20 ++
 .../c-c++-common/goacc/routine-nohost-2.c |  97 
 10 files changed, 566 insertions(+), 75 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/routine-level-of-parallelism-2.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/routine-nohost-2.c

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 187a2dec999..3d5cbe76acf 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -15090,35 +15090,39 @@ c_finish_oacc_routine (struct oacc_routine_data *data, tree fndecl,
   return;
 }
 
-  oacc_verify_routine_clauses (&data->clauses, data->loc);
-
-  if (oacc_get_fn_attrib (fndecl))
+  int compatible
+= oacc_verify_routine_clauses (fndecl, &data->clauses, data->loc,
+   "#pragma acc routine");
+  if (compatible < 0)
 {
-  error_at (data->loc,
-		"%<#pragma acc routine%> already applied to %qD", fndecl);
   data->error_seen = true;
   return;
 }
-
-  if (TREE_USED (fndecl) || (!is_defn && DECL_SAVED_TREE (fndecl)))
+  if (compatible > 0)
 {
-  error_at (data->loc,
-		TREE_USED (fndecl)
-		? G_("%<#pragma acc routine%> must be applied before use")
-		: G_("%<#pragma acc routine%> must be applied before "
-		 "definition"));
-  data->error_seen = true;
-  return;
 }
+  else
+{
+  if (TREE_USED (fndecl) || (!is_defn && DECL_SAVED_TREE (fndecl)))
+	{
+	  error_at (data->loc,
+		TREE_USED (fndecl)
+		? G_("%<#pragma acc routine%> must be applied before use")
+		: G_("%<#pragma acc routine%> must be applied before"
+			 " definition"));
+	  data->error_seen = true;
+	  return;
+	}
 
-  /* Process the routine's dimension clauses.  */
-  tree dims = oacc_build_routine_dims (data->clauses);
-  oacc_replace_fn_attrib (fndecl, dims);
+  /* Set the routine's level of parallelism.  */
+  tree dims = oacc_build_routine_dims (data->clauses);
+  oacc_replace_fn_attrib (fndecl, dims);
 
-  /* Add an "omp declare target" attribute.  */
-  DECL_ATTRIBUTES (fndecl)
-= tree_cons (get_identifier ("omp declare target"),
-		 data->clauses, DECL_ATTRIBUTES (fndecl));
+  /* Add an "omp declare target" attribute.  */
+  DECL_ATTRIBUTES (fndecl)
+	= tree_cons (get_identifier ("omp declare target"),
+		 data->clauses, DECL_ATTRIBUTES (fndecl));
+}
 
   /* Remember that we've used this "#pragma acc routine".  */
   data->fndecl_seen = true;
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index d56105ca177..0d314d63cfd 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -38260,36 +38260,42 @@ cp_finalize_oacc_routine (cp_parser *parser, tree fndecl, bool is_defn)
 	  return;
 	}
 
-  oacc_verify_routine_clauses (&parser->oacc_routine->clauses,
-   parser->oacc_routine->loc);
-
-  if (oacc_get_fn_attrib (fndecl))
+  int compatible
+	= oacc_verify_routine_clauses (fndecl, &parser->oacc_routine->clauses,
+   parser->oacc_routine->loc,
+   "#pragma acc routine");
+  if (compatible < 0)
 	{
-	  error_at (parser->oac

Re: GCC options for kernel live-patching (Was: Add a new option to control inlining only on static functions)

2018-10-02 Thread Martin Liška

On 10/2/18 4:46 PM, Qing Zhao wrote:
> 
>> On Oct 2, 2018, at 9:02 AM, Martin Liška  wrote:
>>
>> On 10/2/18 3:28 PM, Qing Zhao wrote:
>>>
 On Oct 2, 2018, at 3:33 AM, Martin Jambor  wrote:

 Hi,

 my apologies for being terse, I'm in a meeting.

 On Mon, Oct 01 2018, Qing Zhao wrote:
> Hi, Martin,
>
> I have studied a little more on
>
> https://github.com/marxin/kgraft-analysis-tool/blob/master/README.md 
> 
>
> in the Section “Usages”, from the example, we can see:
>
> the tool will report a list of affected functions for a function that 
> will be patched.
> In this list, it includes all callers of the patched function, and the 
> cloned functions from the patched function due to ipa const-propogation 
> or ipa sra. 
>
> My question:
>
> what’s the current action to handle the cloned functions from the
> patched function due to ipa const-proposation or ipa sra, etc?

 If we want to patch an inlined, cloned, or IPA-SRAed function, we also
 patch all of its callers.
>>>
>>> take the example from the link:
>>>
>>> $ gcc /home/marxin/Programming/linux/aesni-intel_glue.i -O2 
>>> -fdump-ipa-clones -c
>>> $ ./kgraft-ipa-analysis.py aesni-intel_glue.i.000i.ipa-clones
>>>
>>> [..skipped..]
>>> Function: fls64/63 (./arch/x86/include/asm/bitops.h:479:90)
>>>  inlining to: __ilog2_u64/132 (include/linux/log2.h:40:5)
>>>inlining to: ablkcipher_request_alloc/1639 
>>> (include/linux/crypto.h:979:82)
>>>  constprop: ablkcipher_request_alloc.constprop.8/3198 
>>> (include/linux/crypto.h:979:82)
>>>inlining to: helper_rfc4106_decrypt/3007 
>>> (arch/x86/crypto/aesni-intel_glue.c:1016:12)
>>>inlining to: helper_rfc4106_encrypt/3006 
>>> (arch/x86/crypto/aesni-intel_glue.c:939:12)
>>>
>>>  Affected functions: 5
>>>__ilog2_u64/132 (include/linux/log2.h:40:5)
>>>ablkcipher_request_alloc/1639 (include/linux/crypto.h:979:82)
>>>ablkcipher_request_alloc.constprop.8/3198 (include/linux/crypto.h:979:82)
>>>helper_rfc4106_decrypt/3007 (arch/x86/crypto/aesni-intel_glue.c:1016:12)
>>>helper_rfc4106_encrypt/3006 (arch/x86/crypto/aesni-intel_glue.c:939:12)
>>> [..skipped..]
>>>
>>>
>>> if we want to patch the function “fls64/63”,  what else functions we need 
>>> to patch, too? my guess is:
>>
>> Hi.
>>
>> Yes, 'Affected functions' is exactly the list of functions you want to patch.
>>
>>>
>>> **all the callers:
>>> __ilog2_u64/132
>>> ablkcipher_request_alloc/1639
>>> helper_rfc4106_decrypt/3007
>>> helper_rfc4106_encrypt/3006 
>>> **and:
>>> ablkcipher_request_alloc.constprop.8/3198
>>> is the above correct?
>>>
>>> how to generate patch for ablkcipher_request_alloc.constprop.8/3198? since 
>>> it’s not a function in the source code?
>>
>> Well, it's a 'static inline' function in a header file thus the function 
>> will be inlined in all usages.
>> In this particular case there's no such caller function, so you're fine.
> 
> So, for cloned functions, you have to analyze them case by case manually to 
> see their callers?

No, the tool should provide complete list of affected functions.

> why not just disable ipa-cp or ipa-sra completely?

Because the optimizations create function clones, which are trackable with my 
tool
and one knows then all affected functions.

You can disable the optimizations, but you'll miss some performance benefit 
provide
by compiler.

Note that as Martin Jambor mentioned in point 2) there are also IPA 
optimizations that
do not create clones. These should be listed and eventually disabled for kernel 
live
patching.

Martin

> 
> Qing
>>
>

[patch,openacc] Check clauses with intrinsic function specified in !$ACC ROUTINE ( NAME )

2018-10-02 Thread Cesar Philippidis

This patch allows Fortran intrinsic functions to be declared as acc
routines. For instance, abort can now be called from acc within
offloaded regions.

Given that intrinsic functions like sin and cos are important for
offloaded functions, I wonder if there is a better way to accomplish to
enabling this. Maybe certain intrinsic functions should default to
having an implied acc routine directive. But I suppose that's something
for another patch.

Is this OK for trunk? I bootstrapped and regtested it for x86_64 Linux
with nvptx offloading.

Thanks,
Cesar
[PR fortran/72741] Check clauses with intrinsic function specified in !$ACC ROUTINE ( NAME )

2018-XX-YY  Cesar Philippidis  

	gcc/fortran/
	* openmp.c (gfc_match_oacc_routine): Check clauses of intrinsic
	functions.

	gcc/testsuite/
	* gfortran.dg/goacc/fixed-1.f: Update test.
	* gfortran.dg/goacc/pr72741-2.f: New test.
	* gfortran.dg/goacc/pr72741-intrinsic-1.f: New test.
	* gfortran.dg/goacc/pr72741-intrinsic-2.f: New test.
	* gfortran.dg/goacc/pr72741.f90: Update test.

	libgomp/
	* testsuite/libgomp.oacc-fortran/abort-1.f90: Update test.
	* testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Update test.

(cherry picked from gomp-4_0-branch r239422)
(cherry picked from gomp-4_0-branch r239515, and r247954)
---
 gcc/fortran/openmp.c  | 41 +++
 gcc/testsuite/gfortran.dg/goacc/fixed-1.f |  2 +
 gcc/testsuite/gfortran.dg/goacc/pr72741-2.f   | 39 ++
 .../gfortran.dg/goacc/pr72741-intrinsic-1.f   | 16 
 .../gfortran.dg/goacc/pr72741-intrinsic-2.f   | 22 ++
 gcc/testsuite/gfortran.dg/goacc/pr72741.f90   | 20 +++--
 .../libgomp.oacc-fortran/abort-1.f90  |  1 +
 .../libgomp.oacc-fortran/acc_on_device-1-2.f  |  1 +
 8 files changed, 130 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/pr72741-2.f
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/pr72741-intrinsic-1.f
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/pr72741-intrinsic-2.f

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 60ecaf54523..58cbe0ae90c 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -2288,8 +2288,9 @@ match
 gfc_match_oacc_routine (void)
 {
   locus old_loc;
-  gfc_symbol *sym = NULL;
   match m;
+  gfc_intrinsic_sym *isym = NULL;
+  gfc_symbol *sym = NULL;
   gfc_omp_clauses *c = NULL;
   gfc_oacc_routine_name *n = NULL;
   oacc_function dims;
@@ -2311,12 +2312,14 @@ gfc_match_oacc_routine (void)
   if (m == MATCH_YES)
 {
   char buffer[GFC_MAX_SYMBOL_LEN + 1];
-  gfc_symtree *st;
+  gfc_symtree *st = NULL;
 
   m = gfc_match_name (buffer);
   if (m == MATCH_YES)
 	{
-	  st = gfc_find_symtree (gfc_current_ns->sym_root, buffer);
+	  if ((isym = gfc_find_function (buffer)) == NULL
+	  && (isym = gfc_find_subroutine (buffer)) == NULL)
+	st = gfc_find_symtree (gfc_current_ns->sym_root, buffer);
 	  if (st)
 	{
 	  sym = st->n.sym;
@@ -2325,7 +2328,7 @@ gfc_match_oacc_routine (void)
 	sym = NULL;
 	}
 
-	  if (st == NULL
+	  if ((isym == NULL && st == NULL)
 	  || (sym
 		  && !sym->attr.external
 		  && !sym->attr.function
@@ -2337,6 +2340,13 @@ gfc_match_oacc_routine (void)
 	  gfc_current_locus = old_loc;
 	  return MATCH_ERROR;
 	}
+
+	  /* Set sym to NULL if it matches the current procedure's
+	 name.  This will simplify the check for duplicate ACC
+	 ROUTINE attributes.  */
+	  if (gfc_current_ns->proc_name
+	  && !strcmp (buffer, gfc_current_ns->proc_name->name))
+	sym = NULL;
 	}
   else
 {
@@ -2357,15 +2367,30 @@ gfc_match_oacc_routine (void)
 	  != MATCH_YES))
 return MATCH_ERROR;
 
+  /* Scan for invalid routine geometry.  */
   dims = gfc_oacc_routine_dims (c);
   if (dims == OACC_FUNCTION_NONE)
 {
-  gfc_error ("Multiple loop axes specified for routine %C");
-  gfc_current_locus = old_loc;
-  return MATCH_ERROR;
+  gfc_error ("Multiple loop axes specified in !$ACC ROUTINE at %C");
+
+  /* Don't abort early, because it's important to let the user
+	 know of any potential duplicate routine directives.  */
+  seen_error = true;
 }
 
-  if (sym != NULL)
+  if (isym != NULL)
+{
+  if (c && (c->gang || c->worker || c->vector))
+	{
+	  gfc_error ("Intrinsic symbol specified in !$ACC ROUTINE ( NAME ) "
+		 "at %C, with incompatible clauses specifying the level "
+		 "of parallelism");
+	  goto cleanup;
+	}
+  /* The intrinsic symbol has been marked with a SEQ, or with no clause at
+	 all, which is OK.  */
+}
+  else if (sym != NULL)
 {
   bool needs_entry = true;
 
diff --git a/gcc/testsuite/gfortran.dg/goacc/fixed-1.f b/gcc/testsuite/gfortran.dg/goacc/fixed-1.f
index 974f2702260..3a900c5b4e6 100644
--- a/gcc/testsuite/gfortran.dg/goacc/fixed-1.f
+++ b/gcc/testsuite/gfortran.dg/goacc/fixed-1.f
@@ -1,3 +1,5 @@
+!$ACC ROUTINE(ABORT) SEQ
+
   INTEGER :: ARGC
   ARGC =

Re: [PATCH 0/2][IRA,LRA] Fix PR86939, IRA incorrectly creates an interference between a pseudo register and a hard register

2018-10-02 Thread Peter Bergner

On 10/1/18 10:52 PM, Peter Bergner wrote:
> Now that we handle conflicts at definitions and the pic hard reg
> is set via a copy from the pic pseudo, my PATCH 2 is setup to
> handle exactly this scenario (ie, a copy between a pseudo and
> a hard reg).  I looked at the asm output from a build with both
> PATCH 1 and PATCH 2, and yes, it also does not add the conflict
> between the pic pseudo and pic hard reg, so our other option to
> fix PR87479 is to apply PATCH 2.  However, since PATCH 2 handles
> the pic pseudo and pic hard reg conflict itself, that means we
> don't need the special pic conflict code and it can be removed!
> I'm going to update PATCH 2 to remove that pic handling code
> and send it through bootstrap and regtesting.

Here is an updated PATCH 2 that adds the generic handling of copies between
pseudos and hard regs which obsoletes the special conflict handling of the
REAL_PIC_OFFSET_TABLE_REGNUM with the pic pseudo.  I have confirmed the
assembler generated by this patch for test case pr63534.c matches the code
generated before PATCH 1 was committed, so we are successfully removing the
copy of the pic pseudo into the pic hard reg with this patch.

I'm currently performing bootstrap and regtesting on powerpc64le-linux and
x86_64-linux.  H.J., could you please test this patch on i686 to verify it
doesn't expose any other problems there?  Otherwise, I'll take Jeff's
suggestion and attempt a build on gcc45, but it sounds like the results
will take a while.

Is this patch version ok for trunk assuming no regressions are found in
the testing mentioned above?

Peter

gcc/
PR rtl-optimization/86939
PR rtl-optimization/87479
* ira.h (copy_insn_p): New prototype.
* ira-lives.c (ignore_reg_for_conflicts): New static variable.
(make_hard_regno_dead): Don't add conflicts for register
ignore_reg_for_conflicts.
(make_object_dead): Likewise.
(copy_insn_p): New function.
(process_bb_node_lives): Set ignore_reg_for_conflicts for copies.
Remove special conflict handling of REAL_PIC_OFFSET_TABLE_REGNUM.
* lra-lives.c (ignore_reg_for_conflicts): New static variable.
(make_hard_regno_dead): Don't add conflicts for register
ignore_reg_for_conflicts.  Remove special conflict handling of
REAL_PIC_OFFSET_TABLE_REGNUM.  Remove now unused argument
check_pic_pseudo_p and update callers.
(mark_pseudo_dead): Don't add conflicts for register
ignore_reg_for_conflicts.
(process_bb_lives): Set ignore_reg_for_conflicts for copies.

gcc/testsuite/
* gcc.target/powerpc/pr86939.c: New test.

Index: gcc/ira.h
===
--- gcc/ira.h   (revision 264789)
+++ gcc/ira.h   (working copy)
@@ -210,6 +210,9 @@ extern void ira_adjust_equiv_reg_cost (u
 /* ira-costs.c */
 extern void ira_costs_c_finalize (void);

+/* ira-lives.c */
+extern rtx copy_insn_p (rtx_insn *);
+
 /* Spilling static chain pseudo may result in generation of wrong
non-local goto code using frame-pointer to address saved stack
pointer value after restoring old frame pointer value.  The
Index: gcc/ira-lives.c
===
--- gcc/ira-lives.c (revision 264789)
+++ gcc/ira-lives.c (working copy)
@@ -84,6 +84,10 @@ static int *allocno_saved_at_call;
supplemental to recog_data.  */
 static alternative_mask preferred_alternatives;

+/* If non-NULL, the source operand of a register to register copy for which
+   we should not add a conflict with the copy's destination operand.  */
+static rtx ignore_reg_for_conflicts;
+
 /* Record hard register REGNO as now being live.  */
 static void
 make_hard_regno_live (int regno)
@@ -101,6 +105,11 @@ make_hard_regno_dead (int regno)
 {
   ira_object_t obj = ira_object_id_map[i];

+  if (ignore_reg_for_conflicts != NULL_RTX
+ && REGNO (ignore_reg_for_conflicts)
+== (unsigned int) ALLOCNO_REGNO (OBJECT_ALLOCNO (obj)))
+   continue;
+
   SET_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS (obj), regno);
   SET_HARD_REG_BIT (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj), regno);
 }
@@ -154,12 +163,38 @@ static void
 make_object_dead (ira_object_t obj)
 {
   live_range_t lr;
+  int ignore_regno = -1;
+  int last_regno = -1;

   sparseset_clear_bit (objects_live, OBJECT_CONFLICT_ID (obj));

+  /* Check whether any part of IGNORE_REG_FOR_CONFLICTS already conflicts
+ with OBJ.  */
+  if (ignore_reg_for_conflicts != NULL_RTX
+  && REGNO (ignore_reg_for_conflicts) < FIRST_PSEUDO_REGISTER)
+{
+  last_regno = END_REGNO (ignore_reg_for_conflicts);
+  int src_regno = ignore_regno = REGNO (ignore_reg_for_conflicts);
+
+  while (src_regno < last_regno)
+   {
+ if (TEST_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS (obj), src_regno))
+   {
+ ignore_regno = last_regno = -1;
+

[patch,openacc] Check for sufficient parallelism when calling acc routines in Fortran

2018-10-02 Thread Cesar Philippidis

This patch updates the Fortran FE OpenACC routine parser to enforce the
new OpenACC 2.5 routine directive semantics. In addition to emitting a
warning when the user doesn't specify a gang, worker or vector clause,
it also clarifies some error messages and introduces a new error when
the user tries to use an acc routine with insufficient parallelism,
e.g., calling a gang routine from a vector loop.

Is this patch OK for trunk? I bootstrapped and regtested it for x86_64
Linux with nvptx offloading.

Thanks,
Cesar
[OpenACC] Check for sufficient parallelism when calling acc routines in fortran

2018-XX-YY  Cesar Philippidis  

	gcc/fortran/
	* gfortran.h (gfc_resolve_oacc_routine_call): Declare.
	(gfc_resolve_oacc_routines): Declare.
	* openmp.c (gfc_match_oacc_routine): Make error reporting more
	precise.  Defer rejection of non-function and subroutine symbols
	until gfc_resolve_oacc_routines.
	(struct fortran_omp_context): Add a dims member.
	(gfc_resolve_oacc_blocks): Update ctx->dims.
	(gfc_resolve_oacc_routine_call): New function.
	(gfc_resolve_oacc_routines): New function.
	* resolve.c (resolve_function): Call gfc_resolve_oacc_routine_call.
	(resolve_call): Likewise.
	(resolve_codes): Call gfc_resolve_oacc_routines.

	gcc/testsuite/
	* gfortran.dg/goacc/routine-10.f90: New test.
	* gfortran.dg/goacc/routine-9.f90: New test.
	* gfortran.dg/goacc/routine-nested-parallelism.f: New test.
	* gfortran.dg/goacc/routine-nested-parallelism.f90: New test.

(cherry picked from gomp-4_0-branch r239784)
(cherry picked from gomp-4_0-branch r247353)

---
 gcc/fortran/gfortran.h|   2 +
 gcc/fortran/openmp.c  | 108 +-
 gcc/fortran/resolve.c |  11 +
 .../gfortran.dg/goacc/routine-10.f90  |   6 +
 gcc/testsuite/gfortran.dg/goacc/routine-9.f90 |  96 +
 .../goacc/routine-nested-parallelism.f| 340 ++
 .../goacc/routine-nested-parallelism.f90  | 340 ++
 7 files changed, 887 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/routine-10.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/routine-9.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/routine-nested-parallelism.f
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/routine-nested-parallelism.f90

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 781dc2a7d17..87f98bbd110 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -3166,6 +3166,8 @@ void gfc_resolve_oacc_directive (gfc_code *, gfc_namespace *);
 void gfc_resolve_oacc_declare (gfc_namespace *);
 void gfc_resolve_oacc_parallel_loop_blocks (gfc_code *, gfc_namespace *);
 void gfc_resolve_oacc_blocks (gfc_code *, gfc_namespace *);
+void gfc_resolve_oacc_routine_call (gfc_symbol *, locus *); 
+void gfc_resolve_oacc_routines (gfc_namespace *);
 
 /* expr.c */
 void gfc_free_actual_arglist (gfc_actual_arglist *);
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 58cbe0ae90c..5850538c1f0 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -2319,7 +2319,13 @@ gfc_match_oacc_routine (void)
 	{
 	  if ((isym = gfc_find_function (buffer)) == NULL
 	  && (isym = gfc_find_subroutine (buffer)) == NULL)
-	st = gfc_find_symtree (gfc_current_ns->sym_root, buffer);
+	{
+	  st = gfc_find_symtree (gfc_current_ns->sym_root, buffer);
+	  if (st == NULL && gfc_current_ns->proc_name->attr.contained
+		  && gfc_current_ns->parent)
+		st = gfc_find_symtree (gfc_current_ns->parent->sym_root,
+   buffer);
+	}
 	  if (st)
 	{
 	  sym = st->n.sym;
@@ -2327,18 +2333,12 @@ gfc_match_oacc_routine (void)
 		  && strcmp (sym->name, gfc_current_ns->proc_name->name) == 0)
 	sym = NULL;
 	}
-
-	  if ((isym == NULL && st == NULL)
-	  || (sym
-		  && !sym->attr.external
-		  && !sym->attr.function
-		  && !sym->attr.subroutine))
+	  else if (isym == NULL)
 	{
-	  gfc_error ("Syntax error in !$ACC ROUTINE ( NAME ) at %C, "
-			 "invalid function name %s",
-			 (sym) ? sym->name : buffer);
-	  gfc_current_locus = old_loc;
-	  return MATCH_ERROR;
+	  gfc_error ("Syntax error in !$ACC ROUTINE ( NAME ) at %L, "
+			 "invalid function name %qs", &old_loc, buffer);\
+	  goto cleanup;
+
 	}
 
 	  /* Set sym to NULL if it matches the current procedure's
@@ -2371,20 +2371,27 @@ gfc_match_oacc_routine (void)
   dims = gfc_oacc_routine_dims (c);
   if (dims == OACC_FUNCTION_NONE)
 {
-  gfc_error ("Multiple loop axes specified in !$ACC ROUTINE at %C");
+  gfc_error ("Multiple loop axes specified in !$ACC ROUTINE at %L",
+		 &old_loc);
 
   /* Don't abort early, because it's important to let the user
 	 know of any potential duplicate routine directives.  */
   seen_error = true;
 }
+  else if (dims == OACC_FUNCTION_AUTO)
+{
+  gfc_warning (0, "Expected one of %, %, % or "
+		   "% clauses in !$ACC ROUTI

Re: libgo patch committed: Update to 1.11 release

2018-10-02 Thread Ian Lance Taylor

On Fri, Sep 28, 2018 at 12:05 AM, Uros Bizjak  wrote:
> On Wed, Sep 26, 2018 at 9:57 AM, Uros Bizjak  wrote:
>> I've committed a patch to update libgo to the 1.11 release.  As usual
>> for these updates, the patch is too large to attach to this e-mail
>> message.  I've attached some of the more relevant directories.  This
>> update required some minor patches to the gotools directory and the Go
>> testsuite, also included here.  Bootstrapped and ran Go testsuite on
>> x86_64-pc-linux-gnu.  Committed to mainline.
>
> There is one new libgo failure on CentOS 5.11:
>
> --- FAIL: TestSplice (0.18s)
> --- FAIL: TestSplice/readerAtEOF (0.01s)
> splice_test.go:228: closed connection: got err = pipe2:
> function not implemented, handled = false, want handled = true
> FAIL
> FAIL: net
>
> as there is no pipe2 on old systems.

This isn't a real failure, only a testsuite failure.  I believe that
this patch will fix it.  The test is assuming that the unexported
slice function will handle the splice, but if pipe2 does not work then
it doesn't.  The relevant code in internal/poll/splice_linux.go says
"Falling back to pipe is possible, but prior to 2.6.29 splice returns
-EAGAIN instead of 0 when the connection is closed."  Bootstrapped and
ran Go tests on x86_64-pc-linux-gnu (on a newer kernel).  Committed to
mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 264773)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-53d0d7ca278a5612fcdb5fb098e7bf950a0178ef
+098e36f4ddfcf50aeb34509b5f25b86d7050749c
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/net/splice_test.go
===
--- libgo/go/net/splice_test.go (revision 264648)
+++ libgo/go/net/splice_test.go (working copy)
@@ -11,7 +11,9 @@ import (
"fmt"
"io"
"io/ioutil"
+   "os"
"sync"
+   "syscall"
"testing"
 )
 
@@ -225,6 +227,10 @@ func testSpliceReaderAtEOF(t *testing.T)
serverUp.Close()
_, err, handled := splice(serverDown.(*TCPConn).fd, serverUp)
if !handled {
+   if serr, ok := err.(*os.SyscallError); ok && serr.Syscall == 
"pipe2" && serr.Err == syscall.ENOSYS {
+   t.Skip("pipe2 not supported")
+   }
+
t.Errorf("closed connection: got err = %v, handled = %t, want 
handled = true", err, handled)
}
lr := &io.LimitedReader{

[patch,openacc] Add warning for unused acc routine parallelism

2018-10-02 Thread Cesar Philippidis

This patch teaches omp-general to be a little more verbose when it comes
time to reporting the missing usage of gang, worker, and vector clauses
on acc routines. As before, the Fortran FE does this directly so that it
can handle modules. Therefore, this primarily handle the C and C++ cases
(although certain Fortran routines fall though to this).

Is this OK for trunk? I bootstrapped and regtested it for x86_64 Linux
with nvptx offloading. This patch only touches the OpenACC code path.

Thanks,
Cesar
[OpenACC] Add warning for unused acc routine parallelism

(was [OpenACC] Don't error on implicitly private induction variables in gfortran)

2018-XX-YY  Cesar Philippidis  

	gcc/
	* omp-general.c (oacc_verify_routine_clauses): New warning.

	gcc/testsuite/
	* c-c++-common/goacc-gomp/nesting-fail-1.c: Update test.
	* c-c++-common/goacc/Wparentheses-1.c: Likewise.
	* c-c++-common/goacc/builtin-goacc-parlevel-id-size-2.c: Likewise.
	* c-c++-common/goacc/builtin-goacc-parlevel-id-size.c: Likewise.
	* c-c++-common/goacc/nesting-fail-1.c: Likewise.
	* c-c++-common/goacc/routine-1.c: Likewise.
	* c-c++-common/goacc/routine-level-of-parallelism-1.c: Likewise.
	* c-c++-common/goacc/routine-level-of-parallelism-2.c: Likewise.
	* c-c++-common/goacc/routine-nohost-1.c: Likewise.
	* c-c++-common/goacc/routine-nohost-2.c: Likewise.
	* g++.dg/goacc/routine-1.C: Likewise.
	* g++.dg/goacc/routine-2.C: Likewise.
	* gfortran.dg/goacc/pr72741-2.f: Likewise.
	* gfortran.dg/goacc/routine-9.f90: Likewise.
	* gfortran.dg/goacc/routine-without-clauses.f90: New test.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/declare-2.c: Update test.
	* testsuite/libgomp.oacc-c-c++-common/declare-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/declare-4.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/host_data-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-dim-default.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/mode-transitions.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-loop-2.h:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-3.c: Likewise.

(cherry picked from gomp-4_0-branch r244980)
---
 gcc/omp-general.c |  6 ++-
 .../c-c++-common/goacc-gomp/nesting-fail-1.c  |  4 +-
 .../c-c++-common/goacc/Wparentheses-1.c   |  4 +-
 .../goacc/builtin-goacc-parlevel-id-size-2.c  |  2 +
 .../goacc/builtin-goacc-parlevel-id-size.c|  2 +
 .../c-c++-common/goacc/nesting-fail-1.c   |  2 +-
 gcc/testsuite/c-c++-common/goacc/routine-1.c  |  4 ++
 .../goacc/routine-level-of-parallelism-1.c|  8 ++--
 .../goacc/routine-level-of-parallelism-2.c| 34 
 .../c-c++-common/goacc/routine-nohost-1.c | 20 +-
 .../c-c++-common/goacc/routine-nohost-2.c | 40 +--
 gcc/testsuite/g++.dg/goacc/routine-1.C|  6 +--
 gcc/testsuite/g++.dg/goacc/routine-2.C| 10 ++---
 gcc/testsuite/gfortran.dg/goacc/pr72741-2.f   |  4 +-
 gcc/testsuite/gfortran.dg/goacc/routine-9.f90 | 22 +-
 .../goacc/routine-without-clauses.f90 | 34 
 .../libgomp.oacc-c-c++-common/declare-2.c |  4 +-
 .../libgomp.oacc-c-c++-common/declare-3.c |  2 +-
 .../libgomp.oacc-c-c++-common/declare-4.c |  2 +-
 .../libgomp.oacc-c-c++-common/host_data-1.c   |  2 +-
 .../loop-dim-default.c|  2 +-
 .../mode-transitions.c|  2 +-
 .../parallel-loop-2.h |  2 +-
 .../libgomp.oacc-c-c++-common/routine-1.c |  2 +-
 .../libgomp.oacc-c-c++-common/routine-3.c |  2 +-
 25 files changed, 132 insertions(+), 90 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/routine-without-clauses.f90

diff --git a/gcc/omp-general.c b/gcc/omp-general.c
index 5c91ce73a50..d290766329f 100644
--- a/gcc/omp-general.c
+++ b/gcc/omp-general.c
@@ -613,8 +613,10 @@ oacc_verify_routine_clauses (tree fndecl, tree *clauses, location_t loc,
   }
   if (c_level == NULL_TREE)
 {
-  /* OpenACC 2.5 makes this an error; for the current OpenACC 2.0a
-	 implementation add an implicit "seq" clause.  */
+  /* OpenACC 2.5 expects the user to supply one parallelism clause.  */
+  warning_at (loc, 0, "expecting one of %, %, % "
+		  "or % clauses");
+  inform (loc, "assigning % parallelism to this routine");
   c_level = build_omp_clause (loc, OMP_CLAUSE_SEQ);
   OMP_CLAUSE_CHAIN (c_level) = *clauses;
   *clauses = c_level;
diff --git a/gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c b/gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
index 1a3324200e2..57eaa0296d6 100644
--- a/gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
+++ b/gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
@@ -362,7 +362,7 @@ f_acc_data (void)
   }
 }
 
-#pragma acc routine
+#pragma acc routine seq
 void
 f_acc_loop (void)
 {
@@ -436,7 +436,7 @@ f_acc_loop (void)
 }
 }
 
-#pragm

Re: GCC options for kernel live-patching (Was: Add a new option to control inlining only on static functions)

2018-10-02 Thread Qing Zhao



> On Oct 2, 2018, at 9:55 AM, Martin Liška  wrote:
 
 Affected functions: 5
   __ilog2_u64/132 (include/linux/log2.h:40:5)
   ablkcipher_request_alloc/1639 (include/linux/crypto.h:979:82)
   ablkcipher_request_alloc.constprop.8/3198 (include/linux/crypto.h:979:82)
   helper_rfc4106_decrypt/3007 (arch/x86/crypto/aesni-intel_glue.c:1016:12)
   helper_rfc4106_encrypt/3006 (arch/x86/crypto/aesni-intel_glue.c:939:12)
 [..skipped..]
 
 
 if we want to patch the function “fls64/63”,  what else functions we need 
 to patch, too? my guess is:
>>> 
>>> Hi.
>>> 
>>> Yes, 'Affected functions' is exactly the list of functions you want to 
>>> patch.
>>> 
 
 **all the callers:
 __ilog2_u64/132
 ablkcipher_request_alloc/1639
 helper_rfc4106_decrypt/3007
 helper_rfc4106_encrypt/3006 
 **and:
 ablkcipher_request_alloc.constprop.8/3198
 is the above correct?
 
 how to generate patch for ablkcipher_request_alloc.constprop.8/3198? since 
 it’s not a function in the source code?
>>> 
>>> Well, it's a 'static inline' function in a header file thus the function 
>>> will be inlined in all usages.
>>> In this particular case there's no such caller function, so you're fine.
>> 
>> So, for cloned functions, you have to analyze them case by case manually to 
>> see their callers?
> 
> No, the tool should provide complete list of affected functions.

So,  the tool will provide the callers of the cloned routine? then we will 
patch the callers of the cloned routine, Not the cloned routine itself?

> 
>> why not just disable ipa-cp or ipa-sra completely?
> 
> Because the optimizations create function clones, which are trackable with my 
> tool
> and one knows then all affected functions.
Okay. I see. 
> 
> You can disable the optimizations, but you'll miss some performance benefit 
> provide
> by compiler.
> 
> Note that as Martin Jambor mentioned in point 2) there are also IPA 
> optimizations that
> do not create clones. These should be listed and eventually disabled for 
> kernel live
> patching.

Yes, such IPA analyzes should be disabled.  we need to identify a complete list 
of such analyzes.

thanks.

Qing

PING * 2: Re: VRP: special case all pointer conversion code

2018-10-02 Thread Aldy Hernandez

PING * 2

 Forwarded Message 
Subject: PING: Re: VRP: special case all pointer conversion code
Date: Wed, 26 Sep 2018 13:12:19 -0400
From: Aldy Hernandez 
To: gcc-patches 
CC: Jeff Law 

PING

On 9/17/18 6:12 AM, Aldy Hernandez wrote:
It seems most of the remaining anti range code in 
extract_range_from_unary_expr for CONVERT_EXPR_P is actually dealing 
with non-nullness in practice.

Anti-range handling is mostly handled by canonicalizing anti-ranges into 
its two set constituents (~[10,20] => [MIN,9] U [21,MAX]) and dealing 
with them piece-meal.  For that matter, the only way we can reach the 
conversion code in extract_range_from_unary_expr with an anti-range is 
either with a pointer (because pointers are ignored from 
ranges_from_anti_range on purpose), or when converting integers of the 
form ~[SSA, SSA].  I verified this with a bootstrap + tests with some 
specially placed asserts, BTW.

So... if we special handle pointer conversions (both to and fro, as 
opposed to just to), we get rid of any anti-ranges with the exception of 
~[SSA, SSA] between integers.  And anti-ranges of unknown quantities 
(SSAs) will be handled automatically already (courtesy of 
extract_range_into_wide_ints).

I propose we handle pointers at the beginning, and everything else just 
falls into place, with no special code.

As commented in the code, this will pessimize conversions from (char 
*)~[0, 2] to int, because we will forget that the range can also not be 
1 or 2.  But as Jeff commented, we really only care about null or 
non-nullness.  Special handling magic pointers with constants IMO is a 
wasted effort.  For that matter, I think it was me that added this 
spaghetti a few weeks ago to make sure we handled ~[0,2].  We weren't 
even handling it a short while back :-).  Furthermore, in a bootstrap, I 
think we only triggered this twice.  And I'm not even sure we make 
further use of anything null/not-null for pointers later on.

This patch simplifies the code, and removes more special handling and 
cryptic comments related to anti-ranges.

Tested with all languages including Ada and Go.

OK for trunk?

[PATCH, i386]: Perform ix86_emit_i387_round calculations in XFmode

2018-10-02 Thread Uros Bizjak

The emitted assembly is actually the same.

2018-10-02  Uros Bizjak  

* config/i386/i386.c (ix86_emit_i387_round): Extend op1 to XFmode
before emitting fxam.  Perform calculations in XFmode.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 55bf18b171cd..d35ad91b55c0 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -43951,24 +43951,27 @@ void ix86_emit_i387_round (rtx op0, rtx op1)
 {
   machine_mode inmode = GET_MODE (op1);
   machine_mode outmode = GET_MODE (op0);
-  rtx e1, e2, res, tmp, tmp1, half;
+  rtx e1 = gen_reg_rtx (XFmode);
+  rtx e2 = gen_reg_rtx (XFmode);
   rtx scratch = gen_reg_rtx (HImode);
   rtx flags = gen_rtx_REG (CCNOmode, FLAGS_REG);
+  rtx half = const_double_from_real_value (dconsthalf, XFmode);
+  rtx res = gen_reg_rtx (outmode);
   rtx_code_label *jump_label = gen_label_rtx ();
-  rtx insn;
-  rtx (*gen_abs) (rtx, rtx);
-  rtx (*gen_neg) (rtx, rtx);
+  rtx (*floor_insn) (rtx, rtx);
+  rtx (*neg_insn) (rtx, rtx);
+  rtx insn, tmp;
 
   switch (inmode)
 {
 case E_SFmode:
-  gen_abs = gen_abssf2;
-  break;
 case E_DFmode:
-  gen_abs = gen_absdf2;
+  tmp = gen_reg_rtx (XFmode);
+
+  emit_insn (gen_rtx_SET (tmp, gen_rtx_FLOAT_EXTEND (XFmode, op1)));
+  op1 = tmp;
   break;
 case E_XFmode:
-  gen_abs = gen_absxf2;
   break;
 default:
   gcc_unreachable ();
@@ -43977,84 +43980,61 @@ void ix86_emit_i387_round (rtx op0, rtx op1)
   switch (outmode)
 {
 case E_SFmode:
-  gen_neg = gen_negsf2;
+  floor_insn = gen_frndintxf2_floor;
+  neg_insn = gen_negsf2;
   break;
 case E_DFmode:
-  gen_neg = gen_negdf2;
+  floor_insn = gen_frndintxf2_floor;
+  neg_insn = gen_negdf2;
   break;
 case E_XFmode:
-  gen_neg = gen_negxf2;
+  floor_insn = gen_frndintxf2_floor;
+  neg_insn = gen_negxf2;
   break;
 case E_HImode:
-  gen_neg = gen_neghi2;
+  floor_insn = gen_lfloorxfhi2;
+  neg_insn = gen_neghi2;
   break;
 case E_SImode:
-  gen_neg = gen_negsi2;
+  floor_insn = gen_lfloorxfsi2;
+  neg_insn = gen_negsi2;
   break;
 case E_DImode:
-  gen_neg = gen_negdi2;
+  floor_insn = gen_lfloorxfdi2;
+  neg_insn = gen_negdi2;
   break;
 default:
   gcc_unreachable ();
 }
 
-  e1 = gen_reg_rtx (inmode);
-  e2 = gen_reg_rtx (inmode);
-  res = gen_reg_rtx (outmode);
-
-  half = const_double_from_real_value (dconsthalf, inmode);
-
   /* round(a) = sgn(a) * floor(fabs(a) + 0.5) */
 
   /* scratch = fxam(op1) */
-  emit_insn (gen_rtx_SET (scratch,
- gen_rtx_UNSPEC (HImode, gen_rtvec (1, op1),
- UNSPEC_FXAM)));
+  emit_insn (gen_fxamxf2_i387 (scratch, op1));
+
   /* e1 = fabs(op1) */
-  emit_insn (gen_abs (e1, op1));
+  emit_insn (gen_absxf2 (e1, op1));
 
   /* e2 = e1 + 0.5 */
-  half = force_reg (inmode, half);
-  emit_insn (gen_rtx_SET (e2, gen_rtx_PLUS (inmode, e1, half)));
+  half = force_reg (XFmode, half);
+  emit_insn (gen_rtx_SET (e2, gen_rtx_PLUS (XFmode, e1, half)));
 
   /* res = floor(e2) */
-  if (inmode != XFmode)
-{
-  tmp1 = gen_reg_rtx (XFmode);
-
-  emit_insn (gen_rtx_SET (tmp1, gen_rtx_FLOAT_EXTEND (XFmode, e2)));
-}
-  else
-tmp1 = e2;
-
   switch (outmode)
 {
 case E_SFmode:
 case E_DFmode:
   {
-   rtx tmp0 = gen_reg_rtx (XFmode);
-
-   emit_insn (gen_frndintxf2_floor (tmp0, tmp1));
+   tmp = gen_reg_rtx (XFmode);
 
+   emit_insn (floor_insn (tmp, e2));
emit_insn (gen_rtx_SET (res,
-   gen_rtx_UNSPEC (outmode, gen_rtvec (1, tmp0),
+   gen_rtx_UNSPEC (outmode, gen_rtvec (1, tmp),
UNSPEC_TRUNC_NOOP)));
   }
   break;
-case E_XFmode:
-  emit_insn (gen_frndintxf2_floor (res, tmp1));
-  break;
-case E_HImode:
-  emit_insn (gen_lfloorxfhi2 (res, tmp1));
-  break;
-case E_SImode:
-  emit_insn (gen_lfloorxfsi2 (res, tmp1));
-  break;
-case E_DImode:
-  emit_insn (gen_lfloorxfdi2 (res, tmp1));
-   break;
 default:
-  gcc_unreachable ();
+  emit_insn (floor_insn (res, e2));
 }
 
   /* flags = signbit(a) */
@@ -44069,7 +44049,7 @@ void ix86_emit_i387_round (rtx op0, rtx op1)
   predict_jump (REG_BR_PROB_BASE * 50 / 100);
   JUMP_LABEL (insn) = jump_label;
 
-  emit_insn (gen_neg (res, res));
+  emit_insn (neg_insn (res, res));
 
   emit_label (jump_label);
   LABEL_NUSES (jump_label) = 1;

Re: [PATCH] warn for sprintf argument mismatches (PR 87034)

2018-10-02 Thread Jeff Law

On 8/21/18 8:10 PM, Martin Sebor wrote:
> It didn't seem like we were making progress in the debate about
> warning for sprintf argument mismatches earlier today so I took
> a couple of hours this afternoon to prototype one of the solutions
> I was trying to describe.  It mostly keeps the existing interface
> and just extends c_strlen() and the other functions to pass in
> an in-out argument describing the requested element size on input
> and the constant string on output.  The caller is responsible for
> validating the string to make sure its type matches the expected
> type.
> 
> String functions like strcpy interested in the size of their
> argument in bytes succeed even for a wide string argument and
> are candidates for folding (this matches the original behavior).
> The patch doesn't add any warning for mismatched calls to those
> (such as strcpy(d, (char*)L"123");) but enhancing it to do that
> would be just as "simple" as adding the missing nul detection.
> 
> Calls to sprintf "%s" and "%ls" also succeed with mismatched
> arguments but get a warning:
> 
> warning: ‘%s’ invalid directive argument type ‘int[4]’ [-Wformat-overflow=]
> 3 |   __builtin_sprintf (d, "%s", (char*)L"123");
>   |  ^~  ~~
> warning: ‘%ls’ invalid directive argument type ‘char[4]’
> [-Wformat-overflow=]
> 4 |   __builtin_sprintf (d, "%ls", (__WCHAR_TYPE__*)"123");
>   |  ^~~~
> 
> FWIW, I chose the approach of adding a c_strlen_data structure
> over adding yet another argument to c_strlen() and friends to
> keep the argument list from getting too long and confusing
> (like get_range_strlen).  This should also make it easy to
> retrofit the missig nul detection patch on top of it.
> 
> I didn't take any time to add tests for the restored strcpy
> (et al.) folding.
> 
> If this looks like the general direction we can agree on (perhaps
> with some tweaks, including not folding etc.) I will add those
> tests, plus more for the various argument mismatches.
> 
> Tested on x86_64-linux.
> 
> Martin
> 
> gcc-87034.diff
> 
> PR tree-optimization/87034 - missing -Wformat-overflow on a sprintf %s with a 
> wide string
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/87034
>   * builtins.c (c_strlen): Add an argument.  Optionally return string
>   to caller.
>   * builtins.h (c_strlen): Add an argument.
>   * gimple-fold.c (get_range_strlen): Replace argument with
>   c_strlen_data *.
>   (get_maxval_strlen): Adjust call to get_range_strlen.
>   (gimple_fold_builtin_strlen): Same.
>   * gimple-fold.h (c_strlen_data): New struct.
>   (get_range_strlen): Add optional argument.
>   * gimple-ssa-sprintf.c (get_string_length): Change argument type.
>   (format_string): Same.  Adjust.
>   (format_directive): Diagnose incompatible arguments.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/87034
>   * gcc.dg/tree-ssa/builtin-sprintf-warn-20.c: Remove xfail.
So just now starting to look at this...  Sorry.

So hunks of this are no longer necessary as I ended up doing something
similar to c_strlen's API.  My c_strlen_data is out parameters only, but
obviously could be used for an in parameter like ELTSIZE.

>  
> -  /* Determine the size of the string element.  */
> -  if (eltsize != tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (src)
> -return NULL_TREE;
> +  unsigned eltsize = 1;
> +  if (pdata)
> +{
> +  eltsize = pdata->eltsize;
>  
> +  tree srctype = TREE_TYPE (TREE_TYPE (src));
> +  if (eltsize != tree_to_uhwi (TYPE_SIZE_UNIT (srctype)))
> + pdata->string = src;
> +}
> +
So if I'm reading this hunk correctly, we'll no longer return NULL_TREE
when presented with an ELTSIZE that is not equal to the TYPE_SIZE_UNIT.

That seems wrong.  Conceptually the return value from c_strlen should be
either the length as it would be computed by strlen or NULL_TREE
indicating it's a case we can't handle (lack of NUL termination,
mismatches on sizes of the elements, etc).  In those cases where we
return NULL_TREE, callers can dig into the c_strlen_data to get
additional information for diagnostics and such.  We don't currently
provide the actual string within c_strlen_data, but we easily could.  I
think that's the biggest thing you need IIUC your patch correctly.



> Index: gcc/gimple-ssa-sprintf.c
> ===
> --- gcc/gimple-ssa-sprintf.c  (revision 263754)
> +++ gcc/gimple-ssa-sprintf.c  (working copy)
[ ... ]
There will be some textual conflicts with your #4/6, but nothing that
shouldn't be easily resolvable.


> @@ -2153,9 +2157,21 @@ format_string (const directive &dir, tree arg, vr_
>  {
>fmtresult res;
>  
> -  /* Compute the range the argument's length can be in.  */
> -  int count_by = dir.specifier == 'S' || dir.modifier == FMT_LEN_l ? 4 : 1;
> -  fmtresult slen = get_string_length (arg, count_by);
>

Re: [PATCH] Properly mark lambdas in GCOV (PR gcov-profile/86109).

2018-10-02 Thread Jeff Law

On 9/12/18 6:39 AM, Martin Liška wrote:
> Hi.
> 
> This is follow-up of:
> https://gcc.gnu.org/ml/gcc/2018-08/msg7.html
> 
> I've chosen to implement that with new DECL_CXX_LAMBDA_FUNCTION that
> uses an empty bit in tree_function_decl.
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Ready for trunk?
> 
> gcc/ChangeLog:
> 
> 2018-09-12  Martin Liska  
> 
>   PR gcov-profile/86109
>   * coverage.c (coverage_begin_function): Do not
>   mark lambdas as artificial.
>   * tree-core.h (struct GTY): Remove tm_clone_flag
>   and introduce new lambda_function.
>   * tree.h (DECL_CXX_LAMBDA_FUNCTION): New macro.
> 
> gcc/cp/ChangeLog:
> 
> 2018-09-12  Martin Liska  
> 
>   PR gcov-profile/86109
>   * parser.c (cp_parser_lambda_declarator_opt):
>   Set DECL_CXX_LAMBDA_FUNCTION for lambdas.
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-09-12  Martin Liska  
> 
>   PR gcov-profile/86109
>   * g++.dg/gcov/pr86109.C: New test.
So the concern here is C++-isms bleeding into the language independent
nodes.  I think a name change from DECL_CXX_LAMBDA_FUNCTION to something
else would be enough to go forward.

jeff

[PATCH, i386]: Remove isinf patterns

2018-10-02 Thread Uros Bizjak

Generic builtins are better also for x87.

2018-10-02  Uros Bizjak  

* config/i386/i386.md (fxam2_i387_with_temp): Remove.
(isinfxf2): Ditto.
(isinf2): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 91947518119e..367e9bfe255b 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -16456,81 +16456,6 @@
(set_attr "unit" "i387")
(set_attr "mode" "")])
 
-(define_insn_and_split "fxam2_i387_with_temp"
-  [(set (match_operand:HI 0 "register_operand")
-   (unspec:HI
- [(match_operand:MODEF 1 "memory_operand")]
- UNSPEC_FXAM_MEM))]
-  "TARGET_USE_FANCY_MATH_387
-   && can_create_pseudo_p ()"
-  "#"
-  "&& 1"
-  [(set (match_dup 2)(match_dup 1))
-   (set (match_dup 0)
-   (unspec:HI [(match_dup 2)] UNSPEC_FXAM))]
-{
-  operands[2] = gen_reg_rtx (mode);
-
-  MEM_VOLATILE_P (operands[1]) = 1;
-}
-  [(set_attr "type" "multi")
-   (set_attr "unit" "i387")
-   (set_attr "mode" "")])
-
-(define_expand "isinfxf2"
-  [(use (match_operand:SI 0 "register_operand"))
-   (use (match_operand:XF 1 "register_operand"))]
-  "TARGET_USE_FANCY_MATH_387
-   && ix86_libc_has_function (function_c99_misc)"
-{
-  rtx mask = GEN_INT (0x45);
-  rtx val = GEN_INT (0x05);
-
-  rtx scratch = gen_reg_rtx (HImode);
-  rtx res = gen_reg_rtx (QImode);
-
-  emit_insn (gen_fxamxf2_i387 (scratch, operands[1]));
-
-  emit_insn (gen_andqi_ext_1 (scratch, scratch, mask));
-  emit_insn (gen_cmpqi_ext_3 (scratch, val));
-  ix86_expand_setcc (res, EQ,
-gen_rtx_REG (CCmode, FLAGS_REG), const0_rtx);
-  emit_insn (gen_zero_extendqisi2 (operands[0], res));
-  DONE;
-})
-
-(define_expand "isinf2"
-  [(use (match_operand:SI 0 "register_operand"))
-   (use (match_operand:MODEF 1 "nonimmediate_operand"))]
-  "TARGET_USE_FANCY_MATH_387
-   && ix86_libc_has_function (function_c99_misc)
-   && !(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)"
-{
-  rtx mask = GEN_INT (0x45);
-  rtx val = GEN_INT (0x05);
-
-  rtx scratch = gen_reg_rtx (HImode);
-  rtx res = gen_reg_rtx (QImode);
-
-  /* Remove excess precision by forcing value through memory. */
-  if (memory_operand (operands[1], VOIDmode))
-emit_insn (gen_fxam2_i387_with_temp (scratch, operands[1]));
-  else
-{
-  rtx temp = assign_386_stack_local (mode, SLOT_TEMP);
-
-  emit_move_insn (temp, operands[1]);
-  emit_insn (gen_fxam2_i387_with_temp (scratch, temp));
-}
-
-  emit_insn (gen_andqi_ext_1 (scratch, scratch, mask));
-  emit_insn (gen_cmpqi_ext_3 (scratch, val));
-  ix86_expand_setcc (res, EQ,
-gen_rtx_REG (CCmode, FLAGS_REG), const0_rtx);
-  emit_insn (gen_zero_extendqisi2 (operands[0], res));
-  DONE;
-})
-
 (define_expand "signbittf2"
   [(use (match_operand:SI 0 "register_operand"))
(use (match_operand:TF 1 "register_operand"))]

Re: [PATCH 0/2][IRA,LRA] Fix PR86939, IRA incorrectly creates an interference between a pseudo register and a hard register

2018-10-02 Thread H.J. Lu

On Tue, Oct 2, 2018 at 7:59 AM Peter Bergner  wrote:
>
> On 10/1/18 10:52 PM, Peter Bergner wrote:
> > Now that we handle conflicts at definitions and the pic hard reg
> > is set via a copy from the pic pseudo, my PATCH 2 is setup to
> > handle exactly this scenario (ie, a copy between a pseudo and
> > a hard reg).  I looked at the asm output from a build with both
> > PATCH 1 and PATCH 2, and yes, it also does not add the conflict
> > between the pic pseudo and pic hard reg, so our other option to
> > fix PR87479 is to apply PATCH 2.  However, since PATCH 2 handles
> > the pic pseudo and pic hard reg conflict itself, that means we
> > don't need the special pic conflict code and it can be removed!
> > I'm going to update PATCH 2 to remove that pic handling code
> > and send it through bootstrap and regtesting.
>
> Here is an updated PATCH 2 that adds the generic handling of copies between
> pseudos and hard regs which obsoletes the special conflict handling of the
> REAL_PIC_OFFSET_TABLE_REGNUM with the pic pseudo.  I have confirmed the
> assembler generated by this patch for test case pr63534.c matches the code
> generated before PATCH 1 was committed, so we are successfully removing the
> copy of the pic pseudo into the pic hard reg with this patch.
>
> I'm currently performing bootstrap and regtesting on powerpc64le-linux and
> x86_64-linux.  H.J., could you please test this patch on i686 to verify it
> doesn't expose any other problems there?  Otherwise, I'll take Jeff's

I am waiting for the result of your previous patch.  I will test this one after
that.

> suggestion and attempt a build on gcc45, but it sounds like the results
> will take a while.
>
> Is this patch version ok for trunk assuming no regressions are found in
> the testing mentioned above?
>
> Peter
>
>
> gcc/
> PR rtl-optimization/86939
> PR rtl-optimization/87479
> * ira.h (copy_insn_p): New prototype.
> * ira-lives.c (ignore_reg_for_conflicts): New static variable.
> (make_hard_regno_dead): Don't add conflicts for register
> ignore_reg_for_conflicts.
> (make_object_dead): Likewise.
> (copy_insn_p): New function.
> (process_bb_node_lives): Set ignore_reg_for_conflicts for copies.
> Remove special conflict handling of REAL_PIC_OFFSET_TABLE_REGNUM.
> * lra-lives.c (ignore_reg_for_conflicts): New static variable.
> (make_hard_regno_dead): Don't add conflicts for register
> ignore_reg_for_conflicts.  Remove special conflict handling of
> REAL_PIC_OFFSET_TABLE_REGNUM.  Remove now unused argument
> check_pic_pseudo_p and update callers.
> (mark_pseudo_dead): Don't add conflicts for register
> ignore_reg_for_conflicts.
> (process_bb_lives): Set ignore_reg_for_conflicts for copies.
>


-- 
H.J.

[PATCH 1/2] S/390: Rename arch12 to z14

2018-10-02 Thread Andreas Krebbel

This is a mechanical change not impacting code generation.  With that
patch I try to hide the artificial CPU name arch12 which we had to use
before the announcement of the IBM z14 machine.  arch12 of course
stays a valid option to -march and -mtune.  So this is just about
making the code somewhat easier to read.

gcc/ChangeLog:

2018-10-02  Andreas Krebbel  

* common/config/s390/s390-common.c: Rename PF_ARCH12 to PF_Z14.
* config/s390/s390.h (enum processor_flags): Rename PF_ARCH12 to
PF_Z14.  Rename TARGET_CPU_ARCH12 to TARGET_CPU_Z14,
TARGET_CPU_ARCH12_P to TARGET_CPU_Z14_P, TARGET_ARCH12 to
TARGET_Z14, and TARGET_ARCH12_P to TARGET_Z14_P.
* config/s390/s390.md: Likewise. Rename also the cpu attribute
value from arch12 to z14.
---
 gcc/common/config/s390/s390-common.c |  4 ++--
 gcc/config/s390/s390.h   | 16 
 gcc/config/s390/s390.md  | 26 +-
 3 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/gcc/common/config/s390/s390-common.c 
b/gcc/common/config/s390/s390-common.c
index a56443c..2f72895 100644
--- a/gcc/common/config/s390/s390-common.c
+++ b/gcc/common/config/s390/s390-common.c
@@ -44,9 +44,9 @@ EXPORTED_CONST int processor_flags_table[] =
 /* z13 */PF_IEEE_FLOAT | PF_ZARCH | PF_LONG_DISPLACEMENT
  | PF_EXTIMM | PF_DFP | PF_Z10 | PF_Z196 | PF_ZEC12 | PF_TX
  | PF_Z13 | PF_VX,
-/* arch12 */ PF_IEEE_FLOAT | PF_ZARCH | PF_LONG_DISPLACEMENT
+/* z14 */PF_IEEE_FLOAT | PF_ZARCH | PF_LONG_DISPLACEMENT
  | PF_EXTIMM | PF_DFP | PF_Z10 | PF_Z196 | PF_ZEC12 | PF_TX
- | PF_Z13 | PF_VX | PF_VXE | PF_ARCH12
+ | PF_Z13 | PF_VX | PF_VXE | PF_Z14
   };
 
 /* Change optimizations to be performed, depending on the
diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
index 4fb32b8..bf40b4c 100644
--- a/gcc/config/s390/s390.h
+++ b/gcc/config/s390/s390.h
@@ -38,7 +38,7 @@ enum processor_flags
   PF_TX = 256,
   PF_Z13 = 512,
   PF_VX = 1024,
-  PF_ARCH12 = 2048,
+  PF_Z14 = 2048,
   PF_VXE = 4096
 };
 
@@ -90,10 +90,10 @@ enum processor_flags
(s390_arch_flags & PF_VX)
 #define TARGET_CPU_VX_P(opts) \
(opts->x_s390_arch_flags & PF_VX)
-#define TARGET_CPU_ARCH12 \
-   (s390_arch_flags & PF_ARCH12)
-#define TARGET_CPU_ARCH12_P(opts) \
-   (opts->x_s390_arch_flags & PF_ARCH12)
+#define TARGET_CPU_Z14 \
+   (s390_arch_flags & PF_Z14)
+#define TARGET_CPU_Z14_P(opts) \
+   (opts->x_s390_arch_flags & PF_Z14)
 #define TARGET_CPU_VXE \
(s390_arch_flags & PF_VXE)
 #define TARGET_CPU_VXE_P(opts) \
@@ -143,9 +143,9 @@ enum processor_flags
(TARGET_ZARCH_P (opts->x_target_flags) && TARGET_CPU_VX_P (opts) \
 && TARGET_OPT_VX_P (opts->x_target_flags) \
 && TARGET_HARD_FLOAT_P (opts->x_target_flags))
-#define TARGET_ARCH12 (TARGET_ZARCH && TARGET_CPU_ARCH12)
-#define TARGET_ARCH12_P(opts)  \
-   (TARGET_ZARCH_P (opts->x_target_flags) && TARGET_CPU_ARCH12_P (opts))
+#define TARGET_Z14 (TARGET_ZARCH && TARGET_CPU_Z14)
+#define TARGET_Z14_P(opts) \
+   (TARGET_ZARCH_P (opts->x_target_flags) && TARGET_CPU_Z14_P (opts))
 #define TARGET_VXE \
(TARGET_VX && TARGET_CPU_VXE)
 #define TARGET_VXE_P(opts) \
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 1286d2c..3bd18ac 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -506,11 +506,11 @@
 ;; Processor type.  This attribute must exactly match the processor_type
 ;; enumeration in s390.h.
 
-(define_attr "cpu" "z900,z990,z9_109,z9_ec,z10,z196,zEC12,z13,arch12"
+(define_attr "cpu" "z900,z990,z9_109,z9_ec,z10,z196,zEC12,z13,z14"
   (const (symbol_ref "s390_tune_attr")))
 
 (define_attr "cpu_facility"
-  
"standard,ieee,zarch,cpu_zarch,longdisp,extimm,dfp,z10,z196,zEC12,vx,z13,arch12,vxe"
+  
"standard,ieee,zarch,cpu_zarch,longdisp,extimm,dfp,z10,z196,zEC12,vx,z13,z14,vxe"
   (const_string "standard"))
 
 (define_attr "enabled" ""
@@ -560,8 +560,8 @@
   (match_test "TARGET_Z13"))
 (const_int 1)
 
- (and (eq_attr "cpu_facility" "arch12")
-  (match_test "TARGET_ARCH12"))
+ (and (eq_attr "cpu_facility" "z14")
+  (match_test "TARGET_Z14"))
 (const_int 1)
 
  (and (eq_attr "cpu_facility" "vxe")
@@ -5866,7 +5866,7 @@
 (plus:DI (sign_extend:DI (match_operand:HI 2 "memory_operand""T"))
 (match_operand:DI 1 "register_operand"  "0")))
(clobber (reg:CC CC_REGNUM))]
-  "TARGET_ARCH12"
+  "TARGET_Z14"
   "agh\t%0,%2"
   [(set_attr "op_type"  "RXY")])
 
@@ -6275,7 +6275,7 @@
 (minus:DI (match_operand:DI 1 "register_operand"  "0")
   (sign_extend:D

[PATCH 2/2] S/390: Support IBM z14 Model ZR1 with -march=native

2018-10-02 Thread Andreas Krebbel

This adds the CPU model number of the IBM z14 Model ZR1 machine to
-march=native.  The patch doesn't actually change anything since we
anyway default to z14 for unknown CPU model numbers.  So this is just
for the sake of completeness.

2018-10-02  Andreas Krebbel  

* config/s390/driver-native.c (s390_host_detect_local_cpu): Add
0x3907 as CPU model number.
---
 gcc/config/s390/driver-native.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config/s390/driver-native.c b/gcc/config/s390/driver-native.c
index 4b2dd6e..97f7b05 100644
--- a/gcc/config/s390/driver-native.c
+++ b/gcc/config/s390/driver-native.c
@@ -116,6 +116,7 @@ s390_host_detect_local_cpu (int argc, const char **argv)
  cpu = "z13";
  break;
case 0x3906:
+   case 0x3907:
  cpu = "z14";
  break;
default:
-- 
2.7.4

Re: [PATCH 0/2][IRA,LRA] Fix PR86939, IRA incorrectly creates an interference between a pseudo register and a hard register

2018-10-02 Thread Peter Bergner

On 10/2/18 10:32 AM, H.J. Lu wrote:
> On Tue, Oct 2, 2018 at 7:59 AM Peter Bergner  wrote:
>> I'm currently performing bootstrap and regtesting on powerpc64le-linux and
>> x86_64-linux.  H.J., could you please test this patch on i686 to verify it
>> doesn't expose any other problems there?  Otherwise, I'll take Jeff's
> 
> I am waiting for the result of your previous patch.  I will test this one 
> after
> that.

Great, thanks!

Peter

[PATCH, libgcc]: Use type-generic fpclassify builtins in libgcc2.c

2018-10-02 Thread Uros Bizjak

Nowadays, we have these type-generic builtins always available.

2018-10-02  Uros Bizjak  

* libgcc2.c (isnan): Use __builtin_isnan.
(isfinite): Use __builtin_isfinite.
(isinf): Use __builtin_isinf.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

OK for mainline?

Uros.
diff --git a/libgcc/libgcc2.c b/libgcc/libgcc2.c
index f418f3a354de..8ac2025f7af6 100644
--- a/libgcc/libgcc2.c
+++ b/libgcc/libgcc2.c
@@ -1939,15 +1939,9 @@ NAME (TYPE x, int m)
 #define CONCAT2(A,B)   _CONCAT2(A,B)
 #define _CONCAT2(A,B)  A##B
 
-/* All of these would be present in a full C99 implementation of 
-   and .  Our problem is that only a few systems have such full
-   implementations.  Further, libgcc_s.so isn't currently linked against
-   libm.so, and even for systems that do provide full C99, the extra overhead
-   of all programs using libgcc having to link against libm.  So avoid it.  */
-
-#define isnan(x)   __builtin_expect ((x) != (x), 0)
-#define isfinite(x)__builtin_expect (!isnan((x) - (x)), 1)
-#define isinf(x)   __builtin_expect (!isnan(x) & !isfinite(x), 0)
+#define isnan(x)   __builtin_isnan (x)
+#define isfinite(x)__builtin_isfinite (x)
+#define isinf(x)   __builtin_isinf (x)
 
 #define INFINITY   CONCAT2(__builtin_huge_val, CEXT) ()
 #define I  1i

Re: [PATCH,Fortran] Fix libgfortran/io/close.c for !HAVE_UNLINK_OPEN_FILE

2018-10-02 Thread Janne Blomqvist

On Tue, Oct 2, 2018 at 2:08 PM Gerald Pfeifer  wrote:

> Revision r215307 | jb | 2014-09-16 23:40:28 +0200 (Di, 16 Sep 2014)
>
>PR libfortran/62768 Handle filenames with embedded null characters.
>:
>
> made the changes like the following to libgfortran/io/close.c
>
>#if !HAVE_UNLINK_OPEN_FILE
>- path = fc_strdup (u->file, u->file_len);
>+ path = strdup (u->filename);
>#endif
>
>
> One of our users now reported this build failure for a system where
> (for whatever reason) HAVE_UNLINK_OPEN_FILE is not defined:
>
>.../GCC-HEAD/libgfortran/io/close.c:94:11: error: implicit declaration
> of function ‘strdup’
>94 |path = strdup (u->filename);
>   |   ^~
>
>
> By #undef-ining HAVE_UNLINK_OPEN_FILE beetween the #include "..." and
> #include <...> statements in libgfortran/io/close.c I could reproduce
> this on FreeBSD 11/i386.
>
> And I could validate the fix below, both with and without that #undef
> in place.
>
>
> Tested on i386-unknown-freebsd11.1.
>
>
> Okay to commit?
>
> I'd also like to apply this to older release branches (down to GCC 6)
> since it is obviously broken and the fix appears straightforward. If
> approved, I'm thinking to wait about a week or two before making each
> step backwards (from HEAD to 8, 8 to 7, and 7 to 6).
>
> Gerald
>

Ok, thanks!


-- 
Janne Blomqvist

[PATCH, AArch64 v2 00/11] LSE atomics out-of-line

2018-10-02 Thread Richard Henderson

Changes since v1:
  * Use config/t-slibgcc-libgcc instead of gcc.c changes.
  * Some style fixes.
  * Ifdefs to work with old glibc.

  * Force TImode registers into even regnos.
Required by CASP, allowed by the ABI, and is seen as the
simplier solution than adding two new register classes.

  * Use match_dup instead of matching constraints for CAS{P}.
Matching constraints result in lots of extraneous moves
for TImode, and keeping the expander interface the same
for non-TImode simplifies the code.


r~


Richard Henderson (11):
  aarch64: Simplify LSE cas generation
  aarch64: Improve cas generation
  aarch64: Improve swp generation
  aarch64: Improve atomic-op lse generation
  aarch64: Emit LSE st instructions
  Add visibility to libfunc constructors
  aarch64: Add out-of-line functions for LSE atomics
  aarch64: Implement -matomic-ool
  aarch64: Force TImode values into even registers
  aarch64: Implement TImode compare-and-swap
  Enable -matomic-ool by default

 gcc/config/aarch64/aarch64-protos.h   |  20 +-
 gcc/optabs-libfuncs.h |   2 +
 gcc/common/config/aarch64/aarch64-common.c|   6 +-
 gcc/config/aarch64/aarch64.c  | 494 +++---
 gcc/optabs-libfuncs.c |  26 +-
 .../atomic-comp-swap-release-acquire.c|   2 +-
 .../gcc.target/aarch64/atomic-inst-ldadd.c|  18 +-
 .../gcc.target/aarch64/atomic-inst-ldlogic.c  |  54 +-
 .../gcc.target/aarch64/atomic-op-acq_rel.c|   2 +-
 .../gcc.target/aarch64/atomic-op-acquire.c|   2 +-
 .../gcc.target/aarch64/atomic-op-char.c   |   2 +-
 .../gcc.target/aarch64/atomic-op-consume.c|   2 +-
 .../gcc.target/aarch64/atomic-op-imm.c|   2 +-
 .../gcc.target/aarch64/atomic-op-int.c|   2 +-
 .../gcc.target/aarch64/atomic-op-long.c   |   2 +-
 .../gcc.target/aarch64/atomic-op-relaxed.c|   2 +-
 .../gcc.target/aarch64/atomic-op-release.c|   2 +-
 .../gcc.target/aarch64/atomic-op-seq_cst.c|   2 +-
 .../gcc.target/aarch64/atomic-op-short.c  |   2 +-
 .../aarch64/atomic_cmp_exchange_zero_reg_1.c  |   2 +-
 .../atomic_cmp_exchange_zero_strong_1.c   |   2 +-
 .../gcc.target/aarch64/sync-comp-swap.c   |   2 +-
 .../gcc.target/aarch64/sync-op-acquire.c  |   2 +-
 .../gcc.target/aarch64/sync-op-full.c |   2 +-
 libgcc/config/aarch64/lse.c   | 282 
 gcc/config/aarch64/aarch64.opt|   4 +
 gcc/config/aarch64/atomics.md | 608 ++
 gcc/config/aarch64/iterators.md   |   8 +-
 gcc/config/aarch64/predicates.md  |  12 +
 gcc/doc/invoke.texi   |  14 +-
 libgcc/config.host|   4 +
 libgcc/config/aarch64/t-lse   |  48 ++
 32 files changed, 1058 insertions(+), 576 deletions(-)
 create mode 100644 libgcc/config/aarch64/lse.c
 create mode 100644 libgcc/config/aarch64/t-lse

-- 
2.17.1

[PATCH, AArch64 v2 01/11] aarch64: Simplify LSE cas generation

2018-10-02 Thread Richard Henderson

The cas insn is a single insn, and if expanded properly need not
be split after reload.  Use the proper inputs for the insn.

* config/aarch64/aarch64.c (aarch64_expand_compare_and_swap):
Force oldval into the rval register for TARGET_LSE; emit the compare
during initial expansion so that it may be deleted if unused.
(aarch64_gen_atomic_cas): Remove.
* config/aarch64/atomics.md (@aarch64_compare_and_swap_lse):
Change =&r to +r for operand 0; use match_dup for operand 2;
remove is_weak and mod_f operands as unused.  Drop the split
and merge with...
(@aarch64_atomic_cas): ... this pattern's output; remove.
(@aarch64_compare_and_swap_lse): Similarly.
(@aarch64_atomic_cas): Similarly.
---
 gcc/config/aarch64/aarch64-protos.h |   1 -
 gcc/config/aarch64/aarch64.c|  46 ---
 gcc/config/aarch64/atomics.md   | 121 
 3 files changed, 49 insertions(+), 119 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index caf1d2041f0..3d045cf43be 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -562,7 +562,6 @@ rtx aarch64_load_tp (rtx);
 
 void aarch64_expand_compare_and_swap (rtx op[]);
 void aarch64_split_compare_and_swap (rtx op[]);
-void aarch64_gen_atomic_cas (rtx, rtx, rtx, rtx, rtx);
 
 bool aarch64_atomic_ldop_supported_p (enum rtx_code);
 void aarch64_gen_atomic_ldop (enum rtx_code, rtx, rtx, rtx, rtx, rtx);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 12f7dfe9a75..fbec54fe5da 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14183,16 +14183,27 @@ aarch64_expand_compare_and_swap (rtx operands[])
 }
 
   if (TARGET_LSE)
-emit_insn (gen_aarch64_compare_and_swap_lse (mode, rval, mem, oldval,
-newval, is_weak, mod_s,
-mod_f));
+{
+  /* The CAS insn requires oldval and rval overlap, but we need to
+have a copy of oldval saved across the operation to tell if
+the operation is successful.  */
+  if (mode == QImode || mode == HImode)
+   rval = copy_to_mode_reg (SImode, gen_lowpart (SImode, oldval));
+  else if (reg_overlap_mentioned_p (rval, oldval))
+rval = copy_to_mode_reg (mode, oldval);
+  else
+   emit_move_insn (rval, oldval);
+  emit_insn (gen_aarch64_compare_and_swap_lse (mode, rval, mem,
+  newval, mod_s));
+  aarch64_gen_compare_reg (EQ, rval, oldval);
+}
   else
 emit_insn (gen_aarch64_compare_and_swap (mode, rval, mem, oldval, newval,
 is_weak, mod_s, mod_f));
 
-
   if (mode == QImode || mode == HImode)
-emit_move_insn (operands[1], gen_lowpart (mode, rval));
+rval = gen_lowpart (mode, rval);
+  emit_move_insn (operands[1], rval);
 
   x = gen_rtx_REG (CCmode, CC_REGNUM);
   x = gen_rtx_EQ (SImode, x, const0_rtx);
@@ -14242,31 +14253,6 @@ aarch64_emit_post_barrier (enum memmodel model)
 }
 }
 
-/* Emit an atomic compare-and-swap operation.  RVAL is the destination register
-   for the data in memory.  EXPECTED is the value expected to be in memory.
-   DESIRED is the value to store to memory.  MEM is the memory location.  MODEL
-   is the memory ordering to use.  */
-
-void
-aarch64_gen_atomic_cas (rtx rval, rtx mem,
-   rtx expected, rtx desired,
-   rtx model)
-{
-  machine_mode mode;
-
-  mode = GET_MODE (mem);
-
-  /* Move the expected value into the CAS destination register.  */
-  emit_insn (gen_rtx_SET (rval, expected));
-
-  /* Emit the CAS.  */
-  emit_insn (gen_aarch64_atomic_cas (mode, rval, mem, desired, model));
-
-  /* Compare the expected value with the value loaded by the CAS, to establish
- whether the swap was made.  */
-  aarch64_gen_compare_reg (EQ, rval, expected);
-}
-
 /* Split a compare and swap pattern.  */
 
 void
diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
index bba8e9e9c8e..22660850af1 100644
--- a/gcc/config/aarch64/atomics.md
+++ b/gcc/config/aarch64/atomics.md
@@ -85,56 +85,50 @@
   }
 )
 
-(define_insn_and_split "@aarch64_compare_and_swap_lse"
-  [(set (reg:CC CC_REGNUM) ;; bool out
-(unspec_volatile:CC [(const_int 0)] UNSPECV_ATOMIC_CMPSW))
-   (set (match_operand:SI 0 "register_operand" "=&r")  ;; val out
+(define_insn "@aarch64_compare_and_swap_lse"
+  [(set (match_operand:SI 0 "register_operand" "+r")   ;; val out
 (zero_extend:SI
-  (match_operand:SHORT 1 "aarch64_sync_memory_operand" "+Q"))) ;; memory
+ (match_operand:SHORT 1 "aarch64_sync_memory_operand" "+Q"))) ;; memory
(set (match_dup 1)
 (unspec_volatile:SHORT
-  [(match_operand:SI 2 "aarch64_plus_operand

[PATCH, AArch64 v2 02/11] aarch64: Improve cas generation

2018-10-02 Thread Richard Henderson

Do not zero-extend the input to the cas for subword operations;
instead, use the appropriate zero-extending compare insns.
Correct the predicates and constraints for immediate expected operand.

* config/aarch64/aarch64.c (aarch64_gen_compare_reg_maybe_ze): New.
(aarch64_split_compare_and_swap): Use it.
(aarch64_expand_compare_and_swap): Likewise.  Remove convert_modes;
test oldval against the proper predicate.
* config/aarch64/atomics.md (@atomic_compare_and_swap):
Use nonmemory_operand for expected.
(cas_short_expected_pred): New.
(@aarch64_compare_and_swap): Use it; use "rn" not "rI" to match.
(@aarch64_compare_and_swap): Use "rn" not "rI" for expected.
* config/aarch64/predicates.md (aarch64_plushi_immediate): New.
(aarch64_plushi_operand): New.
---
 gcc/config/aarch64/aarch64.c | 90 +++-
 gcc/config/aarch64/atomics.md| 19 ---
 gcc/config/aarch64/predicates.md | 12 +
 3 files changed, 76 insertions(+), 45 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index fbec54fe5da..0e2b85de1e3 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1613,6 +1613,33 @@ aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
   return cc_reg;
 }
 
+/* Similarly, but maybe zero-extend Y if Y_MODE < SImode.  */
+
+static rtx
+aarch64_gen_compare_reg_maybe_ze(RTX_CODE code, rtx x, rtx y,
+ machine_mode y_mode)
+{
+  if (y_mode == E_QImode || y_mode == E_HImode)
+{
+  if (CONST_INT_P (y))
+   y = GEN_INT (INTVAL (y) & GET_MODE_MASK (y_mode));
+  else
+   {
+ rtx t, cc_reg;
+ machine_mode cc_mode;
+
+ t = gen_rtx_ZERO_EXTEND (SImode, y);
+ t = gen_rtx_COMPARE (CC_SWPmode, t, x);
+ cc_mode = CC_SWPmode;
+ cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
+ emit_set_insn (cc_reg, t);
+ return cc_reg;
+   }
+}
+
+  return aarch64_gen_compare_reg (code, x, y);
+}
+
 /* Build the SYMBOL_REF for __tls_get_addr.  */
 
 static GTY(()) rtx tls_get_addr_libfunc;
@@ -14138,8 +14165,8 @@ aarch64_emit_unlikely_jump (rtx insn)
 void
 aarch64_expand_compare_and_swap (rtx operands[])
 {
-  rtx bval, rval, mem, oldval, newval, is_weak, mod_s, mod_f, x;
-  machine_mode mode, cmp_mode;
+  rtx bval, rval, mem, oldval, newval, is_weak, mod_s, mod_f, x, cc_reg;
+  machine_mode mode, r_mode;
 
   bval = operands[0];
   rval = operands[1];
@@ -14150,36 +14177,19 @@ aarch64_expand_compare_and_swap (rtx operands[])
   mod_s = operands[6];
   mod_f = operands[7];
   mode = GET_MODE (mem);
-  cmp_mode = mode;
 
   /* Normally the succ memory model must be stronger than fail, but in the
  unlikely event of fail being ACQUIRE and succ being RELEASE we need to
  promote succ to ACQ_REL so that we don't lose the acquire semantics.  */
-
   if (is_mm_acquire (memmodel_from_int (INTVAL (mod_f)))
   && is_mm_release (memmodel_from_int (INTVAL (mod_s
 mod_s = GEN_INT (MEMMODEL_ACQ_REL);
 
-  switch (mode)
+  r_mode = mode;
+  if (mode == QImode || mode == HImode)
 {
-case E_QImode:
-case E_HImode:
-  /* For short modes, we're going to perform the comparison in SImode,
-so do the zero-extension now.  */
-  cmp_mode = SImode;
-  rval = gen_reg_rtx (SImode);
-  oldval = convert_modes (SImode, mode, oldval, true);
-  /* Fall through.  */
-
-case E_SImode:
-case E_DImode:
-  /* Force the value into a register if needed.  */
-  if (!aarch64_plus_operand (oldval, mode))
-   oldval = force_reg (cmp_mode, oldval);
-  break;
-
-default:
-  gcc_unreachable ();
+  r_mode = SImode;
+  rval = gen_reg_rtx (r_mode);
 }
 
   if (TARGET_LSE)
@@ -14187,26 +14197,32 @@ aarch64_expand_compare_and_swap (rtx operands[])
   /* The CAS insn requires oldval and rval overlap, but we need to
 have a copy of oldval saved across the operation to tell if
 the operation is successful.  */
-  if (mode == QImode || mode == HImode)
-   rval = copy_to_mode_reg (SImode, gen_lowpart (SImode, oldval));
-  else if (reg_overlap_mentioned_p (rval, oldval))
-rval = copy_to_mode_reg (mode, oldval);
-  else
-   emit_move_insn (rval, oldval);
+  if (reg_overlap_mentioned_p (rval, oldval))
+rval = copy_to_mode_reg (r_mode, oldval);
+  else 
+   emit_move_insn (rval, gen_lowpart (r_mode, oldval));
+
   emit_insn (gen_aarch64_compare_and_swap_lse (mode, rval, mem,
   newval, mod_s));
-  aarch64_gen_compare_reg (EQ, rval, oldval);
+  cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
 }
   else
-emit_insn (gen_aarch64_compare_and_swap (mode, rval, mem, oldval, newval,
-is_weak, mod_s, mod_f));
+{
+

[PATCH, AArch64 v2 03/11] aarch64: Improve swp generation

2018-10-02 Thread Richard Henderson

Allow zero as an input; fix constraints; avoid unnecessary split.

* config/aarch64/aarch64.c (aarch64_emit_atomic_swap): Remove.
(aarch64_gen_atomic_ldop): Don't call it.
* config/aarch64/atomics.md (atomic_exchange):
Use aarch64_reg_or_zero.
(aarch64_atomic_exchange): Likewise.
(aarch64_atomic_exchange_lse): Remove split; remove & from
operand 0; use aarch64_reg_or_zero for input; merge ...
(@aarch64_atomic_swp): ... this and remove.
---
 gcc/config/aarch64/aarch64.c  | 13 --
 gcc/config/aarch64/atomics.md | 49 +++
 2 files changed, 15 insertions(+), 47 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 0e2b85de1e3..f7b0af2589e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14403,15 +14403,6 @@ aarch64_emit_bic (machine_mode mode, rtx dst, rtx s1, 
rtx s2, int shift)
   emit_insn (gen (dst, s2, shift_rtx, s1));
 }
 
-/* Emit an atomic swap.  */
-
-static void
-aarch64_emit_atomic_swap (machine_mode mode, rtx dst, rtx value,
- rtx mem, rtx model)
-{
-  emit_insn (gen_aarch64_atomic_swp (mode, dst, mem, value, model));
-}
-
 /* Emit an atomic load+operate.  CODE is the operation.  OUT_DATA is the
location to store the data read from memory.  OUT_RESULT is the location to
store the result of the operation.  MEM is the memory location to read and
@@ -14452,10 +14443,6 @@ aarch64_gen_atomic_ldop (enum rtx_code code, rtx 
out_data, rtx out_result,
  a SET then emit a swap instruction and finish.  */
   switch (code)
 {
-case SET:
-  aarch64_emit_atomic_swap (mode, out_data, src, mem, model_rtx);
-  return;
-
 case MINUS:
   /* Negate the value and treat it as a PLUS.  */
   {
diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
index e44301b40c7..bc9e396dc96 100644
--- a/gcc/config/aarch64/atomics.md
+++ b/gcc/config/aarch64/atomics.md
@@ -136,7 +136,7 @@
 (define_expand "atomic_exchange"
  [(match_operand:ALLI 0 "register_operand" "")
   (match_operand:ALLI 1 "aarch64_sync_memory_operand" "")
-  (match_operand:ALLI 2 "register_operand" "")
+  (match_operand:ALLI 2 "aarch64_reg_or_zero" "")
   (match_operand:SI 3 "const_int_operand" "")]
   ""
   {
@@ -156,10 +156,10 @@
 
 (define_insn_and_split "aarch64_atomic_exchange"
   [(set (match_operand:ALLI 0 "register_operand" "=&r");; 
output
-(match_operand:ALLI 1 "aarch64_sync_memory_operand" "+Q")) ;; memory
+(match_operand:ALLI 1 "aarch64_sync_memory_operand" "+Q")) ;; memory
(set (match_dup 1)
 (unspec_volatile:ALLI
-  [(match_operand:ALLI 2 "register_operand" "r")   ;; input
+  [(match_operand:ALLI 2 "aarch64_reg_or_zero" "rZ")   ;; input
(match_operand:SI 3 "const_int_operand" "")];; model
   UNSPECV_ATOMIC_EXCHG))
(clobber (reg:CC CC_REGNUM))
@@ -175,22 +175,25 @@
   }
 )
 
-(define_insn_and_split "aarch64_atomic_exchange_lse"
-  [(set (match_operand:ALLI 0 "register_operand" "=&r")
+(define_insn "aarch64_atomic_exchange_lse"
+  [(set (match_operand:ALLI 0 "register_operand" "=r")
 (match_operand:ALLI 1 "aarch64_sync_memory_operand" "+Q"))
(set (match_dup 1)
 (unspec_volatile:ALLI
-  [(match_operand:ALLI 2 "register_operand" "r")
+  [(match_operand:ALLI 2 "aarch64_reg_or_zero" "rZ")
(match_operand:SI 3 "const_int_operand" "")]
   UNSPECV_ATOMIC_EXCHG))]
   "TARGET_LSE"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
   {
-aarch64_gen_atomic_ldop (SET, operands[0], NULL, operands[1],
-operands[2], operands[3]);
-DONE;
+enum memmodel model = memmodel_from_int (INTVAL (operands[3]));
+if (is_mm_relaxed (model))
+  return "swp\t%2, %0, %1";
+else if (is_mm_acquire (model) || is_mm_consume (model))
+  return "swpa\t%2, %0, %1";
+else if (is_mm_release (model))
+  return "swpl\t%2, %0, %1";
+else
+  return "swpal\t%2, %0, %1";
   }
 )
 
@@ -585,28 +588,6 @@
 
 ;; ARMv8.1-A LSE instructions.
 
-;; Atomic swap with memory.
-(define_insn "@aarch64_atomic_swp"
- [(set (match_operand:ALLI 0 "register_operand" "+&r")
-   (match_operand:ALLI 1 "aarch64_sync_memory_operand" "+Q"))
-  (set (match_dup 1)
-   (unspec_volatile:ALLI
-[(match_operand:ALLI 2 "register_operand" "r")
- (match_operand:SI 3 "const_int_operand" "")]
-UNSPECV_ATOMIC_SWP))]
-  "TARGET_LSE && reload_completed"
-  {
-enum memmodel model = memmodel_from_int (INTVAL (operands[3]));
-if (is_mm_relaxed (model))
-  return "swp\t%2, %0, %1";
-else if (is_mm_acquire (model) || is_mm_consume (model))
-  return "swpa\t%2, %0, %1";
-else if (is_mm_release (model))
-  return "swpl\t%2, %0, %1";
-else
-  return "swpal\t%2, %0, %1";
-  })
-
 ;; Atomic load-op: Load data, operate, store result, keep data.
 
 (define_insn "@aarch64_ato

[PATCH, AArch64 v2 04/11] aarch64: Improve atomic-op lse generation

2018-10-02 Thread Richard Henderson

Fix constraints; avoid unnecessary split.  Drop the use of the atomic_op
iterator in favor of the ATOMIC_LDOP iterator; this is simplier and more
logical for ldclr aka bic.

* config/aarch64/aarch64.c (aarch64_emit_bic): Remove.
(aarch64_atomic_ldop_supported_p): Remove.
(aarch64_gen_atomic_ldop): Remove.
* config/aarch64/atomic.md (atomic_):
Fully expand LSE operations here.
(atomic_fetch_): Likewise.
(atomic__fetch): Likewise.
(aarch64_atomic__lse): Drop atomic_op iterator
and use ATOMIC_LDOP instead; use register_operand for the input;
drop the split and emit insns directly.
(aarch64_atomic_fetch__lse): Likewise.
(aarch64_atomic__fetch_lse): Remove.
(@aarch64_atomic_load): Remove.
---
 gcc/config/aarch64/aarch64-protos.h |   2 -
 gcc/config/aarch64/aarch64.c| 176 -
 gcc/config/aarch64/atomics.md   | 197 +++-
 gcc/config/aarch64/iterators.md |   5 +-
 4 files changed, 108 insertions(+), 272 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 3d045cf43be..1d2f8487d1a 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -563,8 +563,6 @@ rtx aarch64_load_tp (rtx);
 void aarch64_expand_compare_and_swap (rtx op[]);
 void aarch64_split_compare_and_swap (rtx op[]);
 
-bool aarch64_atomic_ldop_supported_p (enum rtx_code);
-void aarch64_gen_atomic_ldop (enum rtx_code, rtx, rtx, rtx, rtx, rtx);
 void aarch64_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx);
 
 bool aarch64_gen_adjusted_ldpstp (rtx *, bool, scalar_mode, RTX_CODE);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index f7b0af2589e..867759f7e80 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14226,32 +14226,6 @@ aarch64_expand_compare_and_swap (rtx operands[])
   emit_insn (gen_rtx_SET (bval, x));
 }
 
-/* Test whether the target supports using a atomic load-operate instruction.
-   CODE is the operation and AFTER is TRUE if the data in memory after the
-   operation should be returned and FALSE if the data before the operation
-   should be returned.  Returns FALSE if the operation isn't supported by the
-   architecture.  */
-
-bool
-aarch64_atomic_ldop_supported_p (enum rtx_code code)
-{
-  if (!TARGET_LSE)
-return false;
-
-  switch (code)
-{
-case SET:
-case AND:
-case IOR:
-case XOR:
-case MINUS:
-case PLUS:
-  return true;
-default:
-  return false;
-}
-}
-
 /* Emit a barrier, that is appropriate for memory model MODEL, at the end of a
sequence implementing an atomic operation.  */
 
@@ -14384,156 +14358,6 @@ aarch64_split_compare_and_swap (rtx operands[])
 aarch64_emit_post_barrier (model);
 }
 
-/* Emit a BIC instruction.  */
-
-static void
-aarch64_emit_bic (machine_mode mode, rtx dst, rtx s1, rtx s2, int shift)
-{
-  rtx shift_rtx = GEN_INT (shift);
-  rtx (*gen) (rtx, rtx, rtx, rtx);
-
-  switch (mode)
-{
-case E_SImode: gen = gen_and_one_cmpl_lshrsi3; break;
-case E_DImode: gen = gen_and_one_cmpl_lshrdi3; break;
-default:
-  gcc_unreachable ();
-}
-
-  emit_insn (gen (dst, s2, shift_rtx, s1));
-}
-
-/* Emit an atomic load+operate.  CODE is the operation.  OUT_DATA is the
-   location to store the data read from memory.  OUT_RESULT is the location to
-   store the result of the operation.  MEM is the memory location to read and
-   modify.  MODEL_RTX is the memory ordering to use.  VALUE is the second
-   operand for the operation.  Either OUT_DATA or OUT_RESULT, but not both, can
-   be NULL.  */
-
-void
-aarch64_gen_atomic_ldop (enum rtx_code code, rtx out_data, rtx out_result,
-rtx mem, rtx value, rtx model_rtx)
-{
-  machine_mode mode = GET_MODE (mem);
-  machine_mode wmode = (mode == DImode ? DImode : SImode);
-  const bool short_mode = (mode < SImode);
-  int ldop_code;
-  rtx src;
-  rtx x;
-
-  if (out_data)
-out_data = gen_lowpart (mode, out_data);
-
-  if (out_result)
-out_result = gen_lowpart (mode, out_result);
-
-  /* Make sure the value is in a register, putting it into a destination
- register if it needs to be manipulated.  */
-  if (!register_operand (value, mode)
-  || code == AND || code == MINUS)
-{
-  src = out_result ? out_result : out_data;
-  emit_move_insn (src, gen_lowpart (mode, value));
-}
-  else
-src = value;
-  gcc_assert (register_operand (src, mode));
-
-  /* Preprocess the data for the operation as necessary.  If the operation is
- a SET then emit a swap instruction and finish.  */
-  switch (code)
-{
-case MINUS:
-  /* Negate the value and treat it as a PLUS.  */
-  {
-   rtx neg_src;
-
-   /* Resize the value if necessary.  */
-   if (short_mode)
- src = gen_lowpart (wmode, src);
-
-   neg_src

[PATCH, AArch64 v2 05/11] aarch64: Emit LSE st instructions

2018-10-02 Thread Richard Henderson

When the result of an operation is not used, we can ignore the
result by storing to XZR.  For two of the memory models, using
XZR with LD has a preferred assembler alias, ST.

* config/aarch64/atomics.md (aarch64_atomic__lse):
Use ST for relaxed and release models; load to XZR otherwise;
remove the now unnecessary scratch register.

* gcc.target/aarch64/atomic-inst-ldadd.c: Expect stadd{,l}.
* gcc.target/aarch64/atomic-inst-ldlogic.c: Similarly.
---
 .../gcc.target/aarch64/atomic-inst-ldadd.c| 18 ---
 .../gcc.target/aarch64/atomic-inst-ldlogic.c  | 54 ---
 gcc/config/aarch64/atomics.md | 15 +++---
 3 files changed, 57 insertions(+), 30 deletions(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldadd.c 
b/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldadd.c
index 4b2282c6861..db2206186b4 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldadd.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldadd.c
@@ -67,20 +67,26 @@ TEST (add_load_notreturn, ADD_LOAD_NORETURN)
 TEST (sub_load, SUB_LOAD)
 TEST (sub_load_notreturn, SUB_LOAD_NORETURN)
 
-/* { dg-final { scan-assembler-times "ldaddb\t" 16} } */
+/* { dg-final { scan-assembler-times "ldaddb\t" 8} } */
 /* { dg-final { scan-assembler-times "ldaddab\t" 32} } */
-/* { dg-final { scan-assembler-times "ldaddlb\t" 16} } */
+/* { dg-final { scan-assembler-times "ldaddlb\t" 8} } */
 /* { dg-final { scan-assembler-times "ldaddalb\t" 32} } */
+/* { dg-final { scan-assembler-times "staddb\t" 8} } */
+/* { dg-final { scan-assembler-times "staddlb\t" 8} } */
 
-/* { dg-final { scan-assembler-times "ldaddh\t" 16} } */
+/* { dg-final { scan-assembler-times "ldaddh\t" 8} } */
 /* { dg-final { scan-assembler-times "ldaddah\t" 32} } */
-/* { dg-final { scan-assembler-times "ldaddlh\t" 16} } */
+/* { dg-final { scan-assembler-times "ldaddlh\t" 8} } */
 /* { dg-final { scan-assembler-times "ldaddalh\t" 32} } */
+/* { dg-final { scan-assembler-times "staddh\t" 8} } */
+/* { dg-final { scan-assembler-times "staddlh\t" 8} } */
 
-/* { dg-final { scan-assembler-times "ldadd\t" 32} } */
+/* { dg-final { scan-assembler-times "ldadd\t" 16} } */
 /* { dg-final { scan-assembler-times "ldadda\t" 64} } */
-/* { dg-final { scan-assembler-times "ldaddl\t" 32} } */
+/* { dg-final { scan-assembler-times "ldaddl\t" 16} } */
 /* { dg-final { scan-assembler-times "ldaddal\t" 64} } */
+/* { dg-final { scan-assembler-times "stadd\t" 16} } */
+/* { dg-final { scan-assembler-times "staddl\t" 16} } */
 
 /* { dg-final { scan-assembler-not "ldaxr\t" } } */
 /* { dg-final { scan-assembler-not "stlxr\t" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldlogic.c 
b/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldlogic.c
index 4879d52b9b4..b8a53e0a676 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldlogic.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldlogic.c
@@ -101,54 +101,72 @@ TEST (xor_load_notreturn, XOR_LOAD_NORETURN)
 
 /* Load-OR.  */
 
-/* { dg-final { scan-assembler-times "ldsetb\t" 8} } */
+/* { dg-final { scan-assembler-times "ldsetb\t" 4} } */
 /* { dg-final { scan-assembler-times "ldsetab\t" 16} } */
-/* { dg-final { scan-assembler-times "ldsetlb\t" 8} } */
+/* { dg-final { scan-assembler-times "ldsetlb\t" 4} } */
 /* { dg-final { scan-assembler-times "ldsetalb\t" 16} } */
+/* { dg-final { scan-assembler-times "stsetb\t" 4} } */
+/* { dg-final { scan-assembler-times "stsetlb\t" 4} } */
 
-/* { dg-final { scan-assembler-times "ldseth\t" 8} } */
+/* { dg-final { scan-assembler-times "ldseth\t" 4} } */
 /* { dg-final { scan-assembler-times "ldsetah\t" 16} } */
-/* { dg-final { scan-assembler-times "ldsetlh\t" 8} } */
+/* { dg-final { scan-assembler-times "ldsetlh\t" 4} } */
 /* { dg-final { scan-assembler-times "ldsetalh\t" 16} } */
+/* { dg-final { scan-assembler-times "stseth\t" 4} } */
+/* { dg-final { scan-assembler-times "stsetlh\t" 4} } */
 
-/* { dg-final { scan-assembler-times "ldset\t" 16} } */
+/* { dg-final { scan-assembler-times "ldset\t" 8} } */
 /* { dg-final { scan-assembler-times "ldseta\t" 32} } */
-/* { dg-final { scan-assembler-times "ldsetl\t" 16} } */
+/* { dg-final { scan-assembler-times "ldsetl\t" 8} } */
 /* { dg-final { scan-assembler-times "ldsetal\t" 32} } */
+/* { dg-final { scan-assembler-times "stset\t" 8} } */
+/* { dg-final { scan-assembler-times "stsetl\t" 8} } */
 
 /* Load-AND.  */
 
-/* { dg-final { scan-assembler-times "ldclrb\t" 8} } */
+/* { dg-final { scan-assembler-times "ldclrb\t" 4} } */
 /* { dg-final { scan-assembler-times "ldclrab\t" 16} } */
-/* { dg-final { scan-assembler-times "ldclrlb\t" 8} } */
+/* { dg-final { scan-assembler-times "ldclrlb\t" 4} } */
 /* { dg-final { scan-assembler-times "ldclralb\t" 16} } */
+/* { dg-final { scan-assembler-times "stclrb\t" 4} } */
+/* { dg-final { scan-assembler-times "stclrlb\t" 4} } */
 
-/* { dg-final { scan-assembler-times "ldclrh\t" 8} } */
+/* { dg-final { scan-assemb

[PATCH, AArch64 v2 06/11] Add visibility to libfunc constructors

2018-10-02 Thread Richard Henderson

* optabs-libfuncs.c (build_libfunc_function_visibility):
New, split out from...
(build_libfunc_function): ... here.
(init_one_libfunc_visibility): New, split out from ...
(init_one_libfunc): ... here.
---
 gcc/optabs-libfuncs.h |  2 ++
 gcc/optabs-libfuncs.c | 26 --
 2 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/gcc/optabs-libfuncs.h b/gcc/optabs-libfuncs.h
index 0669ea1fdd7..cf39da36887 100644
--- a/gcc/optabs-libfuncs.h
+++ b/gcc/optabs-libfuncs.h
@@ -63,7 +63,9 @@ void gen_satfract_conv_libfunc (convert_optab, const char *,
 void gen_satfractuns_conv_libfunc (convert_optab, const char *,
   machine_mode, machine_mode);
 
+tree build_libfunc_function_visibility (const char *, symbol_visibility);
 tree build_libfunc_function (const char *);
+rtx init_one_libfunc_visibility (const char *, symbol_visibility);
 rtx init_one_libfunc (const char *);
 rtx set_user_assembler_libfunc (const char *, const char *);
 
diff --git a/gcc/optabs-libfuncs.c b/gcc/optabs-libfuncs.c
index bd0df8baa37..73a28e9ca7a 100644
--- a/gcc/optabs-libfuncs.c
+++ b/gcc/optabs-libfuncs.c
@@ -719,10 +719,10 @@ struct libfunc_decl_hasher : ggc_ptr_hash
 /* A table of previously-created libfuncs, hashed by name.  */
 static GTY (()) hash_table *libfunc_decls;
 
-/* Build a decl for a libfunc named NAME.  */
+/* Build a decl for a libfunc named NAME with visibility VIS.  */
 
 tree
-build_libfunc_function (const char *name)
+build_libfunc_function_visibility (const char *name, symbol_visibility vis)
 {
   /* ??? We don't have any type information; pretend this is "int foo ()".  */
   tree decl = build_decl (UNKNOWN_LOCATION, FUNCTION_DECL,
@@ -731,7 +731,7 @@ build_libfunc_function (const char *name)
   DECL_EXTERNAL (decl) = 1;
   TREE_PUBLIC (decl) = 1;
   DECL_ARTIFICIAL (decl) = 1;
-  DECL_VISIBILITY (decl) = VISIBILITY_DEFAULT;
+  DECL_VISIBILITY (decl) = vis;
   DECL_VISIBILITY_SPECIFIED (decl) = 1;
   gcc_assert (DECL_ASSEMBLER_NAME (decl));
 
@@ -742,11 +742,19 @@ build_libfunc_function (const char *name)
   return decl;
 }
 
+/* Build a decl for a libfunc named NAME.  */
+
+tree
+build_libfunc_function (const char *name)
+{
+  return build_libfunc_function_visibility (name, VISIBILITY_DEFAULT);
+}
+
 /* Return a libfunc for NAME, creating one if we don't already have one.
-   The returned rtx is a SYMBOL_REF.  */
+   The decl is given visibility VIS.  The returned rtx is a SYMBOL_REF.  */
 
 rtx
-init_one_libfunc (const char *name)
+init_one_libfunc_visibility (const char *name, symbol_visibility vis)
 {
   tree id, decl;
   hashval_t hash;
@@ -763,12 +771,18 @@ init_one_libfunc (const char *name)
 {
   /* Create a new decl, so that it can be passed to
 targetm.encode_section_info.  */
-  decl = build_libfunc_function (name);
+  decl = build_libfunc_function_visibility (name, vis);
   *slot = decl;
 }
   return XEXP (DECL_RTL (decl), 0);
 }
 
+rtx
+init_one_libfunc (const char *name)
+{
+  return init_one_libfunc_visibility (name, VISIBILITY_DEFAULT);
+}
+
 /* Adjust the assembler name of libfunc NAME to ASMSPEC.  */
 
 rtx
-- 
2.17.1

[PATCH, AArch64 v2 07/11] aarch64: Add out-of-line functions for LSE atomics

2018-10-02 Thread Richard Henderson

This is the libgcc part of the interface -- providing the functions.
Rationale is provided at the top of libgcc/config/aarch64/lse.c.

* config/aarch64/lse.c: New file.
* config/aarch64/t-lse: New file.
* config.host: Add t-lse to all aarch64 tuples.
---
 libgcc/config/aarch64/lse.c | 260 
 libgcc/config.host  |   4 +
 libgcc/config/aarch64/t-lse |  44 ++
 3 files changed, 308 insertions(+)
 create mode 100644 libgcc/config/aarch64/lse.c
 create mode 100644 libgcc/config/aarch64/t-lse

diff --git a/libgcc/config/aarch64/lse.c b/libgcc/config/aarch64/lse.c
new file mode 100644
index 000..68ca7df667b
--- /dev/null
+++ b/libgcc/config/aarch64/lse.c
@@ -0,0 +1,260 @@
+/* Out-of-line LSE atomics for AArch64 architecture.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   Contributed by Linaro Ltd.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+/*
+ * The problem that we are trying to solve is operating system deployment
+ * of ARMv8.1-Atomics, also known as Large System Exensions (LSE).
+ *
+ * There are a number of potential solutions for this problem which have
+ * been proposed and rejected for various reasons.  To recap:
+ *
+ * (1) Multiple builds.  The dynamic linker will examine /lib64/atomics/
+ * if HWCAP_ATOMICS is set, allowing entire libraries to be overwritten.
+ * However, not all Linux distributions are happy with multiple builds,
+ * and anyway it has no effect on main applications.
+ *
+ * (2) IFUNC.  We could put these functions into libgcc_s.so, and have
+ * a single copy of each function for all DSOs.  However, ARM is concerned
+ * that the branch-to-indirect-branch that is implied by using a PLT,
+ * as required by IFUNC, is too much overhead for smaller cpus.
+ *
+ * (3) Statically predicted direct branches.  This is the approach that
+ * is taken here.  These functions are linked into every DSO that uses them.
+ * All of the symbols are hidden, so that the functions are called via a
+ * direct branch.  The choice of LSE vs non-LSE is done via one byte load
+ * followed by a well-predicted direct branch.  The functions are compiled
+ * separately to minimize code size.
+ */
+
+/* Define or declare the symbol gating the LSE implementations.  */
+#ifndef L_have_atomics
+extern
+#endif
+_Bool __aa64_have_atomics __attribute__((visibility("hidden"), nocommon));
+
+/* The branch controlled by this test should be easily predicted, in that
+   it will, after constructors, always branch the same way.  The expectation
+   is that systems that implement ARMv8.1-Atomics are "beefier" than those
+   that omit the extension.  By arranging for the fall-through path to use
+   load-store-exclusive insns, we aid the branch predictor of the
+   smallest cpus.  */
+#define have_atomics  __builtin_expect (__aa64_have_atomics, 0)
+
+#ifdef L_have_atomics
+/* Disable initialization of __aa64_have_atomics during bootstrap.  */
+# ifndef inhibit_libc
+#  include 
+/* Disable initialization if the system headers are too old.  */
+#  if defined(AT_HWCAP) && defined(HWCAP_ATOMICS)
+static void __attribute__((constructor))
+init_have_atomics (void)
+{
+  unsigned long hwcap = getauxval (AT_HWCAP);
+  __aa64_have_atomics = (hwcap & HWCAP_ATOMICS) != 0;
+}
+#  endif /* HWCAP */
+# endif /* inhibit_libc */
+#else
+
+/* Tell the assembler to accept LSE instructions.  */
+asm(".arch armv8-a+lse");
+
+/* Turn size and memory model defines into mnemonic fragments.  */
+#if SIZE == 1
+# define S "b"
+# define MASK  ", uxtb"
+#elif SIZE == 2
+# define S "h"
+# define MASK  ", uxth"
+#elif SIZE == 4 || SIZE == 8
+# define S ""
+# define MASK  ""
+#else
+# error
+#endif
+
+#if SIZE < 8
+# define T  unsigned int
+# define W  "w"
+#else
+# define T  unsigned long long
+# define W  ""
+#endif
+
+#if MODEL == 1
+# define SUFF  _relax
+# define A ""
+# define L ""
+#elif MODEL == 2
+# define SUFF  _acq
+# define A "a"
+# define L ""
+#elif MODEL == 3
+# define SUFF  _rel
+# define A ""
+# define L "l"
+#elif MODEL == 4
+# define SUFF  _acq_rel
+# define

[PATCH, AArch64 v2 09/11] aarch64: Force TImode values into even registers

2018-10-02 Thread Richard Henderson

The LSE CASP instruction requires values to be placed in even
register pairs.  A solution involving two additional register
classes was rejected in favor of the much simpler solution of
simply requiring all TImode values to be aligned.

* config/aarch64/aarch64.c (aarch64_hard_regno_mode_ok): Force
16-byte modes held in GP registers to use an even regno.
---
 gcc/config/aarch64/aarch64.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 49b47382b5d..ce4d7e51d00 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1451,10 +1451,14 @@ aarch64_hard_regno_mode_ok (unsigned regno, 
machine_mode mode)
   if (regno == FRAME_POINTER_REGNUM || regno == ARG_POINTER_REGNUM)
 return mode == Pmode;
 
-  if (GP_REGNUM_P (regno) && known_le (GET_MODE_SIZE (mode), 16))
-return true;
-
-  if (FP_REGNUM_P (regno))
+  if (GP_REGNUM_P (regno))
+{
+  if (known_le (GET_MODE_SIZE (mode), 8))
+   return true;
+  else if (known_le (GET_MODE_SIZE (mode), 16))
+   return (regno & 1) == 0;
+}
+  else if (FP_REGNUM_P (regno))
 {
   if (vec_flags & VEC_STRUCT)
return end_hard_regno (mode, regno) - 1 <= V31_REGNUM;
-- 
2.17.1

[PATCH, AArch64 v2 08/11] aarch64: Implement -matomic-ool

2018-10-02 Thread Richard Henderson

* config/aarch64/aarch64.opt (-matomic-ool): New.
* config/aarch64/aarch64.c (aarch64_atomic_ool_func): New.
(aarch64_ool_cas_names, aarch64_ool_swp_names): New.
(aarch64_ool_ldadd_names, aarch64_ool_ldset_names): New.
(aarch64_ool_ldclr_names, aarch64_ool_ldeor_names): New.
(aarch64_ool_stadd_names, aarch64_ool_stset_names): New.
(aarch64_ool_stclr_names, aarch64_ool_steor_names): New.
(aarch64_expand_compare_and_swap): Honor TARGET_ATOMIC_OOL.
* config/aarch64/atomics.md (atomic_exchange): Likewise.
(atomic_): Likewise.
(atomic_fetch_): Likewise.
(atomic__fetch): Likewise.
---
 gcc/config/aarch64/aarch64-protos.h   | 17 
 gcc/config/aarch64/aarch64.c  | 95 +++
 .../atomic-comp-swap-release-acquire.c|  2 +-
 .../gcc.target/aarch64/atomic-op-acq_rel.c|  2 +-
 .../gcc.target/aarch64/atomic-op-acquire.c|  2 +-
 .../gcc.target/aarch64/atomic-op-char.c   |  2 +-
 .../gcc.target/aarch64/atomic-op-consume.c|  2 +-
 .../gcc.target/aarch64/atomic-op-imm.c|  2 +-
 .../gcc.target/aarch64/atomic-op-int.c|  2 +-
 .../gcc.target/aarch64/atomic-op-long.c   |  2 +-
 .../gcc.target/aarch64/atomic-op-relaxed.c|  2 +-
 .../gcc.target/aarch64/atomic-op-release.c|  2 +-
 .../gcc.target/aarch64/atomic-op-seq_cst.c|  2 +-
 .../gcc.target/aarch64/atomic-op-short.c  |  2 +-
 .../aarch64/atomic_cmp_exchange_zero_reg_1.c  |  2 +-
 .../atomic_cmp_exchange_zero_strong_1.c   |  2 +-
 .../gcc.target/aarch64/sync-comp-swap.c   |  2 +-
 .../gcc.target/aarch64/sync-op-acquire.c  |  2 +-
 .../gcc.target/aarch64/sync-op-full.c |  2 +-
 gcc/config/aarch64/aarch64.opt|  4 +
 gcc/config/aarch64/atomics.md | 94 --
 gcc/doc/invoke.texi   | 14 ++-
 22 files changed, 232 insertions(+), 26 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 1d2f8487d1a..c7b96b12bbe 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -624,4 +624,21 @@ poly_uint64 aarch64_regmode_natural_size (machine_mode);
 
 bool aarch64_high_bits_all_ones_p (HOST_WIDE_INT);
 
+struct atomic_ool_names
+{
+const char *str[4][4];
+};
+
+rtx aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
+   const atomic_ool_names *names);
+extern const atomic_ool_names aarch64_ool_swp_names;
+extern const atomic_ool_names aarch64_ool_stadd_names;
+extern const atomic_ool_names aarch64_ool_stset_names;
+extern const atomic_ool_names aarch64_ool_stclr_names;
+extern const atomic_ool_names aarch64_ool_steor_names;
+extern const atomic_ool_names aarch64_ool_ldadd_names;
+extern const atomic_ool_names aarch64_ool_ldset_names;
+extern const atomic_ool_names aarch64_ool_ldclr_names;
+extern const atomic_ool_names aarch64_ool_ldeor_names;
+
 #endif /* GCC_AARCH64_PROTOS_H */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 867759f7e80..49b47382b5d 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14160,6 +14160,90 @@ aarch64_emit_unlikely_jump (rtx insn)
   add_reg_br_prob_note (jump, profile_probability::very_unlikely ());
 }
 
+/* We store the names of the various atomic helpers in a 4x4 array.
+   Return the libcall function given MODE, MODEL and NAMES.  */
+
+rtx
+aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
+   const atomic_ool_names *names)
+{
+  memmodel model = memmodel_base (INTVAL (model_rtx));
+  int mode_idx, model_idx;
+
+  switch (mode)
+{
+case E_QImode:
+  mode_idx = 0;
+  break;
+case E_HImode:
+  mode_idx = 1;
+  break;
+case E_SImode:
+  mode_idx = 2;
+  break;
+case E_DImode:
+  mode_idx = 3;
+  break;
+default:
+  gcc_unreachable ();
+}
+
+  switch (model)
+{
+case MEMMODEL_RELAXED:
+  model_idx = 0;
+  break;
+case MEMMODEL_CONSUME:
+case MEMMODEL_ACQUIRE:
+  model_idx = 1;
+  break;
+case MEMMODEL_RELEASE:
+  model_idx = 2;
+  break;
+case MEMMODEL_ACQ_REL:
+case MEMMODEL_SEQ_CST:
+  model_idx = 3;
+  break;
+default:
+  gcc_unreachable ();
+}
+
+  return init_one_libfunc_visibility (names->str[mode_idx][model_idx],
+ VISIBILITY_HIDDEN);
+}
+
+#define DEF0(B, N) \
+  { "__aa64_" #B #N "_relax", \
+"__aa64_" #B #N "_acq", \
+"__aa64_" #B #N "_rel", \
+"__aa64_" #B #N "_acq_rel" }
+
+#define DEF4(B)  DEF0(B, 1), DEF0(B, 2), DEF0(B, 4), DEF0(B, 8)
+
+static const atomic_ool_names aarch64_ool_cas_names = { { DEF4(cas) } };
+const atomic_ool_names aarch64_ool_swp_names = { { DEF4(swp) } };
+const atomic_ool_names aarch64_ool_ldadd_names = { { DEF4(ldadd) } };
+const atomic_ool_names aarch64_ool_l

[PATCH, AArch64 v2 10/11] aarch64: Implement TImode compare-and-swap

2018-10-02 Thread Richard Henderson

This pattern will only be used with the __sync functions, because
we do not yet have a bare TImode atomic load.

* config/aarch64/aarch64.c (aarch64_gen_compare_reg): Add support   
for NE comparison of TImode values.
(aarch64_print_operand): Extend %R to handle general registers.
(aarch64_emit_load_exclusive): Add support for TImode.
(aarch64_emit_store_exclusive): Likewise.
(aarch64_atomic_ool_func): Likewise.
(aarch64_ool_cas_names): Likewise.
* config/aarch64/atomics.md (@atomic_compare_and_swap):
Change iterator from ALLI to ALLI_TI.
(@atomic_compare_and_swap): New.
(@atomic_compare_and_swap_lse): New.
(aarch64_load_exclusive_pair): New.
(aarch64_store_exclusive_pair): New.
* config/aarch64/iterators.md (JUST_TI): New.

* config/aarch64/lse.c (cas): Add support for SIZE == 16.
* config/aarch64/t-lse (S0, O0): Split out cas.
(LSE_OBJS): Include $(O0).
---
 gcc/config/aarch64/aarch64-protos.h |  2 +-
 gcc/config/aarch64/aarch64.c| 72 ++-
 libgcc/config/aarch64/lse.c | 48 ++-
 gcc/config/aarch64/atomics.md   | 91 +++--
 gcc/config/aarch64/iterators.md |  3 +
 libgcc/config/aarch64/t-lse | 10 +++-
 6 files changed, 189 insertions(+), 37 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index c7b96b12bbe..f735c4e5ad8 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -626,7 +626,7 @@ bool aarch64_high_bits_all_ones_p (HOST_WIDE_INT);
 
 struct atomic_ool_names
 {
-const char *str[4][4];
+const char *str[5][4];
 };
 
 rtx aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ce4d7e51d00..ac2f055a09e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1610,10 +1610,33 @@ emit_set_insn (rtx x, rtx y)
 rtx
 aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
 {
-  machine_mode mode = SELECT_CC_MODE (code, x, y);
-  rtx cc_reg = gen_rtx_REG (mode, CC_REGNUM);
+  machine_mode cmp_mode = GET_MODE (x);
+  machine_mode cc_mode;
+  rtx cc_reg;
 
-  emit_set_insn (cc_reg, gen_rtx_COMPARE (mode, x, y));
+  if (cmp_mode == E_TImode)
+{
+  gcc_assert (code == NE);
+
+  cc_mode = E_CCmode;
+  cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
+
+  rtx x_lo = operand_subword (x, 0, 0, TImode);
+  rtx y_lo = operand_subword (y, 0, 0, TImode);
+  emit_set_insn (cc_reg, gen_rtx_COMPARE (cc_mode, x_lo, y_lo));
+
+  rtx x_hi = operand_subword (x, 1, 0, TImode);
+  rtx y_hi = operand_subword (y, 1, 0, TImode);
+  emit_insn (gen_ccmpdi (cc_reg, cc_reg, x_hi, y_hi,
+gen_rtx_EQ (cc_mode, cc_reg, const0_rtx),
+GEN_INT (AARCH64_EQ)));
+}
+  else
+{
+  cc_mode = SELECT_CC_MODE (code, x, y);
+  cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
+  emit_set_insn (cc_reg, gen_rtx_COMPARE (cc_mode, x, y));
+}
   return cc_reg;
 }
 
@@ -6693,7 +6716,7 @@ sizetochar (int size)
  'S/T/U/V':Print a FP/SIMD register name for a register 
list.
The register printed is the FP/SIMD register name
of X + 0/1/2/3 for S/T/U/V.
- 'R':  Print a scalar FP/SIMD register name + 1.
+ 'R':  Print a scalar Integer/FP/SIMD register name + 1.
  'X':  Print bottom 16 bits of integer constant in hex.
  'w/x':Print a general register name or the zero register
(32-bit or 64-bit).
@@ -6885,12 +6908,13 @@ aarch64_print_operand (FILE *f, rtx x, int code)
   break;
 
 case 'R':
-  if (!REG_P (x) || !FP_REGNUM_P (REGNO (x)))
-   {
- output_operand_lossage ("incompatible floating point / vector 
register operand for '%%%c'", code);
- return;
-   }
-  asm_fprintf (f, "q%d", REGNO (x) - V0_REGNUM + 1);
+  if (REG_P (x) && FP_REGNUM_P (REGNO (x)))
+   asm_fprintf (f, "q%d", REGNO (x) - V0_REGNUM + 1);
+  else if (REG_P (x) && GP_REGNUM_P (REGNO (x)))
+   asm_fprintf (f, "x%d", REGNO (x) - R0_REGNUM + 1);
+  else
+   output_operand_lossage ("incompatible register operand for '%%%c'",
+   code);
   break;
 
 case 'X':
@@ -14143,16 +14167,26 @@ static void
 aarch64_emit_load_exclusive (machine_mode mode, rtx rval,
 rtx mem, rtx model_rtx)
 {
-  emit_insn (gen_aarch64_load_exclusive (mode, rval, mem, model_rtx));
+  if (mode == E_TImode)
+emit_insn (gen_aarch64_load_exclusive_pair (gen_lowpart (DImode, rval),
+   gen_highpart (DImode, rval),
+   mem, model_rtx));
+  els

[PATCH, AArch64 v2 11/11] Enable -matomic-ool by default

2018-10-02 Thread Richard Henderson

Do Not Merge Upstream.
This is for agraf and his testing within SLES.
---
 gcc/common/config/aarch64/aarch64-common.c | 6 --
 gcc/config/aarch64/aarch64.c   | 6 --
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/gcc/common/config/aarch64/aarch64-common.c 
b/gcc/common/config/aarch64/aarch64-common.c
index 292fb818705..3bd1312a3f8 100644
--- a/gcc/common/config/aarch64/aarch64-common.c
+++ b/gcc/common/config/aarch64/aarch64-common.c
@@ -31,9 +31,11 @@
 #include "flags.h"
 #include "diagnostic.h"
 
-#ifdef  TARGET_BIG_ENDIAN_DEFAULT
 #undef  TARGET_DEFAULT_TARGET_FLAGS
-#define TARGET_DEFAULT_TARGET_FLAGS (MASK_BIG_END)
+#ifdef  TARGET_BIG_ENDIAN_DEFAULT
+#define TARGET_DEFAULT_TARGET_FLAGS (MASK_BIG_END | MASK_ATOMIC_OOL)
+#else
+#define TARGET_DEFAULT_TARGET_FLAGS (MASK_ATOMIC_OOL)
 #endif
 
 #undef  TARGET_HANDLE_OPTION
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ac2f055a09e..b20d8bbf19b 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -17617,9 +17617,11 @@ aarch64_run_selftests (void)
 #undef TARGET_C_MODE_FOR_SUFFIX
 #define TARGET_C_MODE_FOR_SUFFIX aarch64_c_mode_for_suffix
 
-#ifdef TARGET_BIG_ENDIAN_DEFAULT
 #undef  TARGET_DEFAULT_TARGET_FLAGS
-#define TARGET_DEFAULT_TARGET_FLAGS (MASK_BIG_END)
+#ifdef  TARGET_BIG_ENDIAN_DEFAULT
+#define TARGET_DEFAULT_TARGET_FLAGS (MASK_BIG_END | MASK_ATOMIC_OOL)
+#else
+#define TARGET_DEFAULT_TARGET_FLAGS (MASK_ATOMIC_OOL)
 #endif
 
 #undef TARGET_CLASS_MAX_NREGS
-- 
2.17.1

[PATCH 0/4] avoid relying on type information in get_range_strlen

2018-10-02 Thread Martin Sebor


This patch kit changes the get_range_strlen API to a) make its
use more intuitive and less prone to misuse, and b) relax
the strlen range optimization to avoid making use of array type
sizes to constrain the upper bound of the function return value.

I broke up the changes into a series of four in an attempt to
make them easier to review although I'm not sure to what degree
this will have been successful:

[1/4] - Introduce struct strlen_data_t into gimple-fold
[2/4] - Relax strlen range optimization to avoid making assumptions
about types
[3/4] - Change sprintf to use new get_range_strlen overload
[4/4] - Replace uses of old get_range_strlen with the new one.

Some aspects of the API evolve as the series progresses (e.g.,
the function return type changes) but the basic idea/design are
consistent.

The patch kit applies on top of r264413 and there are going to
be conflicts to resolve with the recent changes in this area.
In particular, the recent addition of a c_strlen_data argument
to c_strlen will need to reconciled with the addition of
the strlen_data_t struct in [1/1].  I think both c_strlen and
get_range_strlen (and perhaps also string_constant) should take
and populate the same data structure.

[PATCH 1/4] introduce struct strlen_data_t into gimple-fold

2018-10-02 Thread Martin Sebor


[1/4] - Introduce struct strlen_data_t into gimple-fold

This patch introduces a new data structure to reduce the number
of arguments and overloads of the get_range_strlen API.  One of
the goals of this change is to make the API safer to use for
optimization (which looks for "permissive" results to account
even for some undefined uses) vs warnings (which relies on
conservative results to avoid false positives).  The patch also
adds provides explicit arguments to get_range_strlen and adds
descriptive comments to make the affected code easier to follow.
Beyond making use of the new data structure the patch makes no
observable changes.

The changes to gcc/testsuite/gcc.dg/strlenopt-51.c fix a few
typos with no functional effect and tweak the helper macro
used by the test to make debugging slightly easier.
[1/4] - Introduce struct strlen_data_t into gimple-fold.

gcc/ChangeLog:

	* builtins.c (check_access): Document argumens to a function call.
	(check_strncat_sizes): Same.
	(expand_builtin_strncat): Same.
	* calls.c (maybe_warn_nonstring_arg): Same.
	* gimple-fold.h (struct strlen_data_t): New type.
	(get_range_strlen): New overload.
	* gimple-fold.c (struct strlen_data_t): New type.
	(get_range_strlen): Change declaration to take strlen_data_t*
	argument instead of length, flexp, eltsize, and nonstr.
	Adjust to use strlen_data_t members instead of other arguments.
	Also set strlen_data_t::maxsize to the same value as maxlen.
	(extern get_range_strlen): Define new overload.
	(get_maxval_strlen): Adjust to use strlen_data_t.
	* gimple-ssa-sprintf.c (get_string_length): Same.

gcc/testsuite/ChangeLog:
	gcc.dg/strlenopt-51.c: Add counter to macros and fix typos.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 3f39d10..3e31af4 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3235,7 +3235,8 @@ check_access (tree exp, tree, tree, tree dstwrite,
 	 the upper bound given by MAXREAD add one to it for
 	 the terminating nul.  Otherwise, set it to one for
 	 the same reason, or to MAXREAD as appropriate.  */
-	  get_range_strlen (srcstr, range);
+	  get_range_strlen (srcstr, range, /* eltsize = */ 1,
+			/* strict = */ false );
 	  if (range[0] && (!maxread || TREE_CODE (maxread) == INTEGER_CST))
 	{
 	  if (maxread && tree_int_cst_le (maxread, range[0]))
@@ -4107,7 +4108,7 @@ check_strncat_sizes (tree exp, tree objsize)
   /* Try to determine the range of lengths that the source expression
  refers to.  */
   tree lenrange[2];
-  get_range_strlen (src, lenrange);
+  get_range_strlen (src, lenrange, /* eltsize = */ 1, /* strict = */ false);
 
   /* Try to verify that the destination is big enough for the shortest
  string.  */
@@ -4174,12 +4175,13 @@ expand_builtin_strncat (tree exp, rtx)
   tree slen = c_strlen (src, 1);
 
   /* Try to determine the range of lengths that the source expression
- refers to.  */
+ refers to.  Since the lengths are only used for warning and not
+ for code generation disable strict mode below.  */
   tree lenrange[2];
   if (slen)
 lenrange[0] = lenrange[1] = slen;
   else
-get_range_strlen (src, lenrange);
+get_range_strlen (src, lenrange, /* eltsize = */ 1, /* strict = */ false);
 
   /* Try to verify that the destination is big enough for the shortest
  string.  First try to determine the size of the destination object
diff --git a/gcc/calls.c b/gcc/calls.c
index e9660b6..11f00ad 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -1557,7 +1557,10 @@ maybe_warn_nonstring_arg (tree fndecl, tree exp)
   tree bound = NULL_TREE;
 
   /* The range of lengths of a string argument to one of the comparison
- functions.  If the length is less than the bound it is used instead.  */
+ functions.  If the length is less than the bound it is used instead.
+ Since the lengths are only used for warning and not for code
+ generation disable strict mode in the calls to get_range_strlen
+ below.  */
   tree lenrng[2] = { NULL_TREE, NULL_TREE };
 
   /* It's safe to call "bounded" string functions with a non-string
@@ -1582,7 +1585,8 @@ maybe_warn_nonstring_arg (tree fndecl, tree exp)
 	  {
 	tree arg = CALL_EXPR_ARG (exp, argno);
g 	if (!get_attr_nonstring_decl (arg))
-	  get_range_strlen (arg, lenrng);
+	  get_range_strlen (arg, lenrng, /* eltsize = */ 1,
+/* strict = */ false);
 	  }
   }
   /* Fall through.  */
@@ -1603,7 +1607,8 @@ maybe_warn_nonstring_arg (tree fndecl, tree exp)
   {
 	tree arg = CALL_EXPR_ARG (exp, 0);
 	if (!get_attr_nonstring_decl (arg))
-	  get_range_strlen (arg, lenrng);
+	  get_range_strlen (arg, lenrng, /* eltsize = */ 1,
+			/* strict = */ false);
 
 	if (nargs > 1)
 	  bound = CALL_EXPR_ARG (exp, 1);
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 1e84722..8f71e9c 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -1262,11 +1262,13 @@ gimple_fold_builtin_memset (gimple_stmt_iterator *gsi, tree c, tree len)
 }
 
 
-/* Obtain the minimum and ma

[PATCH 2/4] - relax strlen range optimization to avoid making assumptions about types

2018-10-02 Thread Martin Sebor


[2/4] - Relax strlen range optimization to avoid making assumptions
about types

This main part of this patch is to relax the strlen range
optimization to avoid relying on array types.  Instead, the function
either removes the upper bound of the strlen result altogether, or
constrains it by the size of referenced declaration (without
attempting to deal with common symbols).

A seemingly "big" change here is splitting up the static
get_range_strlen workhorse into two functions to make it easier
to follow.  The attached cc-9-2-gimple-fold.c.diff-b shows
the diff for the file without considering whitespace changes.

An important change to the get_range_strlen function worth calling
out to is the introduction of the tight_bound local variable.
It controls whether the upper bound computed by the function is
suitable for optimization (the larger value) or warnings
(the smaller value).

Another important change here is replacing the type and fuzzy
arguments to get_range_strlen with a single enum.  The two arguments
were confusing and with all combinations of their possible values
being meaningful.  The original extern get_range_strlen overload
with the type and fuzzy arguments is retained in this patch to
keep the changes contained.  It is removed in [4/4].

Finally, a large number of tests needed adjusting here.  I also
added a few new tests to better exercise the changes.

[2/4] - Relax strlen range optimization to avoid making assumptions about types

gcc/ChangeLog:

	* calls.c (maybe_warn_nonstring_arg): Test for -1.
	* gimple-fold.c (get_range_strlen): Forward declare static.
	(get_range_strlen_tree): New helper.
	(get_range_strlen): Move tree code into get_range_strlen_tree.
	Replace type and fuzzy arguments with a single enum.
	Avoid optimizing ranges based on type, optimize on decl size intead.
	(get_range_strlen): New extern overload.
	(get_range_strlen): Use new overload above.
	(get_maxval_strlen): Declared static.  Assert preconditions.
	Use new overload above.
	(gimple_fold_builtin_strcpy): Adjust to pass enum.
	(gimple_fold_builtin_strncpy): Same.
	(gimple_fold_builtin_strcat): Same.
	(gimple_fold_builtin_fputs): Same.
	(gimple_fold_builtin_memory_chk): Same.
	(gimple_fold_builtin_stxcpy_chk): Same.
	(gimple_fold_builtin_stxncpy_chk): Same.
	(gimple_fold_builtin_snprintf_chk): Same.
	(gimple_fold_builtin_sprintf): Same.
	(gimple_fold_builtin_snprintf): Same.
	(gimple_fold_builtin_strlen): Simplify.  Call set_strlen_range.
	* gimple-fold.h (StrlenType): New enum.
	(get_maxval_strlen): Remove.
	(get_range_strlen): Change default argument value to true (strict).
	* tree-ssa-strlen.c (set_strlen_range): New function.
	(maybe_set_strlen_range): Call it.  Make static.
	(maybe_diag_stxncpy_trunc): Use new get_range_strlen overload.
	Adjust.
	* tree-ssa-strlen.h (set_strlen_range): Declare.

gcc/testsuite/ChangeLog:

	* g++.dg/init/strlen.C: New test.
	* gcc.c-torture/execute/strlen-5.c: New test.
	* gcc.c-torture/execute/strlen-6.c: New test.
	* gcc.c-torture/execute/strlen-7.c: New test.
	* gcc.dg/strlenopt-36.c: Remove tests for an overly aggressive
	optimization.
	* gcc.dg/strlenopt-40.c: Adjust tests to reflect a relaxed
	optimization, remove others for an overly aggressive optimization.
	* gcc.dg/strlenopt-45.c: Same.
	* gcc.dg/strlenopt-48.c: Adjust tests to reflect a relaxed
	optimization.
	* gcc.dg/strlenopt-51.c: Same.
	* gcc.dg/strlenopt-59.c: New test.

diff --git a/gcc/calls.c b/gcc/calls.c
index 11f00ad..617757e6 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -1650,7 +1650,7 @@ maybe_warn_nonstring_arg (tree fndecl, tree exp)
 	}
 }
 
-  if (lenrng[1] && TREE_CODE (lenrng[1]) == INTEGER_CST)
+  if (lenrng[1] && !integer_all_onesp (lenrng[1]))
 {
   /* Add one for the nul.  */
   lenrng[1] = const_binop (PLUS_EXPR, TREE_TYPE (lenrng[1]),
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 8f71e9c..eb77065 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -66,6 +66,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-vector-builder.h"
 #include "tree-ssa-strlen.h"
 
+static bool
+get_range_strlen (tree, bitmap *, StrlenType, strlen_data_t *);
+
 /* Return true when DECL can be referenced from current unit.
FROM_DECL (if non-null) specify constructor of variable DECL was taken from.
We can get declarations that are not possible to reference for various
@@ -1261,224 +1264,290 @@ gimple_fold_builtin_memset (gimple_stmt_iterator *gsi, tree c, tree len)
   return true;
 }
 
-
-/* Try to obtain the range of the lengths of the string(s) referenced
-   by ARG, or the size of the largest array ARG referes to if the range
-   of lengths cannot be determined, and store all in *PDATA.
-   If ARG is an SSA name variable, follow its use-def chains.  When
-   TYPE == 0, then if PDATA->MAXLEN is not equal to the determined
-   length or if we are unable to determine the length or value, return
-   false.
-   VISITED is a bitmap of visited varia

[PATCH 3/4] - Change sprintf to use new get_range_strlen overload

2018-10-02 Thread Martin Sebor


[3/4] - Change sprintf to use new get_range_strlen overload

This change makes use of the new get_range_strlen() overload
in gimple-ssa-sprintf.c.  This necessitated a few changes to
the API but also enabled the removal of the flexarray member
from strlen_data_t.

This also patch restores the bool return value for the public
get_strlen_range function but with a different meaning (to
indicate whether the computed range is suitable as is to rely
on for optimization, rather than whether the argument may
refer to a flexible array member).

The changes to gimple-ssa-sprintf.c involve more indentation
adjustments than new functionality so to make the review easier
I attach gcc-9-3-gimple-ssa-sprintf.c.diff-b with the white
space changes stripped.
[3/4] - Change sprintf to use new get_range_strlen overload

gcc/ChangeLog:

	* gimple-fold.c (get_range_strlen): Avoid clearing minlen after
	recursive call fails for COND_EXPR and GIMPLE_PHI.
	Set minlen to ssize_type rather than size_type.  Remove flexarray.
	(get_range_strlen): Return bool.
	(get_range_strlen): Return bool.  Clear minmaxlen[0].
	* gimple-fold.h (strlen_data_t): Remove member.  Update comments.
	(get_range_strlen): Return bool.
	* tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Reset lenrange[0]
	when maxlen is unbounded.
	* gimple-ssa-sprintf.c (get_string_length): Call new overload of
	get_range_strlen.  Adjust max, likely, and unlikely counters for
	strings of unbounded lengths.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/builtin-snprintf-4.c: New test.

diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index eb77065..bfa9f17 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -1353,12 +1353,6 @@ get_range_strlen_tree (tree arg, bitmap *visited, StrlenType type,
 	  /* Set the minimum size to zero since the string in
 	 the array could have zero length.  */
 	  pdata->minlen = ssize_int (0);
-
-	  if (TREE_CODE (TREE_OPERAND (arg, 0)) == COMPONENT_REF
-	  && optype == TREE_TYPE (TREE_OPERAND (arg, 0))
-	  && array_at_struct_end_p (TREE_OPERAND (arg, 0)))
-	pdata->flexarray = true;
-
 	  tight_bound = true;
 	}
   else if (TREE_CODE (arg) == COMPONENT_REF
@@ -1370,11 +1364,7 @@ get_range_strlen_tree (tree arg, bitmap *visited, StrlenType type,
 	 optimistic if the array itself isn't NUL-terminated and
 	 the caller relies on the subsequent member to contain
 	 the NUL but that would only be considered valid if
-	 the array was the last member of a struct.
-	 Set *FLEXP to true if the array whose bound is being
-	 used is at the end of a struct.  */
-	  if (array_at_struct_end_p (arg))
-	pdata->flexarray = true;
+	 the array was the last member of a struct.  */
 
 	  tree fld = TREE_OPERAND (arg, 1);
 
@@ -1581,12 +1571,14 @@ get_range_strlen (tree arg, bitmap *visited, StrlenType type,
 		{
 		  if (type != StringSizeRange)
 		return false;
-		  /* Set the length range to the maximum to prevent
+		  /* Set the upper bound to the maximum to prevent
 		 it from being adjusted in the next iteration but
-		 leave the more conservative MAXSIZE determined
-		 so far alone (or leave it null if it hasn't been
-		 set yet).  */
-		  pdata->minlen = size_zero_node;
+		 leave MINLEN and the more conservative MAXSIZE
+		 determined so far alone (or leave them null if
+		 they haven't been set yet).  That the MINLEN is
+		 in fact zero can be determined from MAXLEN being
+		 unbounded but the discovered minimum is used for
+		 diagnostics.  */
 		  pdata->maxlen = build_all_ones_cst (size_type_node);
 		}
 	return true;
@@ -1613,12 +1605,14 @@ get_range_strlen (tree arg, bitmap *visited, StrlenType type,
 	  {
 		if (type != StringSizeRange)
 		  return false;
-		  /* Set the length range to the maximum to prevent
-		 it from being adjusted in subsequent iterations
-		 but leave the more conservative MAXSIZE determined
-		 so far alone (or leave it null if it hasn't been
-		 set yet).  */
-		pdata->minlen = size_zero_node;
+		  /* Set the upper bound to the maximum to prevent
+		 it from being adjusted in the next iteration but
+		 leave MINLEN and the more conservative MAXSIZE
+		 determined so far alone (or leave them null if
+		 they haven't been set yet).  That the MINLEN is
+		 in fact zero can be determined from MAXLEN being
+		 unbounded but the discovered minimum is used for
+		 diagnostics.  */
 		pdata->maxlen = build_all_ones_cst (size_type_node);
 	  }
   }
@@ -1631,9 +1625,14 @@ get_range_strlen (tree arg, bitmap *visited, StrlenType type,
 
 /* Try to obtain the range of the lengths of the string(s) referenced
by ARG, or the size of the largest array ARG refers to if the range
-   of lengths cannot be determined, and store all in *PDATA.  */
+   of lengths cannot be determined, and store all in *PDATA.
+   Return true if the range [PDATA->MINLEN, PDATA->MAXLEN] is suitable
+

[PATCH 4/4] - Replace uses of old get_range_strlen with the new one.

2018-10-02 Thread Martin Sebor


[4/4] - Replace uses of old get_range_strlen with the new one.

This change switches the remaining get_range_strlen() callers
to use the new overload of the function that takes a strlen_data_t
argument.  There are no functional changes here.
[4/4] - Replace uses of old get_range_strlen with the new one.

gcc/ChangeLog:

	* builtins.c (check_access): Use new get_range_strlen.
	(check_strncat_sizes): Ditto.
	(expand_builtin_strncat): Ditto.
	* calls.c (maybe_warn_nonstring_arg): Ditto.
	* gimple-fold.h (get_range_strlen): Remove unused overload.
	* gimple-fold.c (get_range_strlen): Ditto.
	(gimple_fold_builtin_strlen): Use new get_range_strlen.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 3e31af4..daa520b 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3235,8 +3235,10 @@ check_access (tree exp, tree, tree, tree dstwrite,
 	 the upper bound given by MAXREAD add one to it for
 	 the terminating nul.  Otherwise, set it to one for
 	 the same reason, or to MAXREAD as appropriate.  */
-	  get_range_strlen (srcstr, range, /* eltsize = */ 1,
-			/* strict = */ false );
+	  strlen_data_t lendata (/* eltsize = */ 1);
+	  get_range_strlen (srcstr, &lendata);
+	  range[0] = lendata.minlen;
+	  range[1] = lendata.maxsize;
 	  if (range[0] && (!maxread || TREE_CODE (maxread) == INTEGER_CST))
 	{
 	  if (maxread && tree_int_cst_le (maxread, range[0]))
@@ -4107,8 +4109,8 @@ check_strncat_sizes (tree exp, tree objsize)
 
   /* Try to determine the range of lengths that the source expression
  refers to.  */
-  tree lenrange[2];
-  get_range_strlen (src, lenrange, /* eltsize = */ 1, /* strict = */ false);
+  strlen_data_t lendata (/* eltsize = */ 1);
+  get_range_strlen (src, &lendata);
 
   /* Try to verify that the destination is big enough for the shortest
  string.  */
@@ -4122,8 +4124,8 @@ check_strncat_sizes (tree exp, tree objsize)
 }
 
   /* Add one for the terminating nul.  */
-  tree srclen = (lenrange[0]
-		 ? fold_build2 (PLUS_EXPR, size_type_node, lenrange[0],
+  tree srclen = (lendata.minlen
+		 ? fold_build2 (PLUS_EXPR, size_type_node, lendata.minlen,
 size_one_node)
 		 : NULL_TREE);
 
@@ -4177,11 +4179,13 @@ expand_builtin_strncat (tree exp, rtx)
   /* Try to determine the range of lengths that the source expression
  refers to.  Since the lengths are only used for warning and not
  for code generation disable strict mode below.  */
-  tree lenrange[2];
-  if (slen)
-lenrange[0] = lenrange[1] = slen;
-  else
-get_range_strlen (src, lenrange, /* eltsize = */ 1, /* strict = */ false);
+  tree maxlen = slen;
+  if (!maxlen)
+{
+  strlen_data_t lendata (/* eltsize = */ 1);
+  get_range_strlen (src, &lendata);
+  maxlen = lendata.maxsize;
+}
 
   /* Try to verify that the destination is big enough for the shortest
  string.  First try to determine the size of the destination object
@@ -4189,8 +4193,8 @@ expand_builtin_strncat (tree exp, rtx)
   tree destsize = compute_objsize (dest, warn_stringop_overflow - 1);
 
   /* Add one for the terminating nul.  */
-  tree srclen = (lenrange[0]
-		 ? fold_build2 (PLUS_EXPR, size_type_node, lenrange[0],
+  tree srclen = (maxlen
+		 ? fold_build2 (PLUS_EXPR, size_type_node, maxlen,
 size_one_node)
 		 : NULL_TREE);
 
diff --git a/gcc/calls.c b/gcc/calls.c
index 617757e6..2c1cd1b 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -1556,12 +1556,11 @@ maybe_warn_nonstring_arg (tree fndecl, tree exp)
   /* The bound argument to a bounded string function like strncpy.  */
   tree bound = NULL_TREE;
 
-  /* The range of lengths of a string argument to one of the comparison
+  /* The longest known or possible string argument to one of the comparison
  functions.  If the length is less than the bound it is used instead.
- Since the lengths are only used for warning and not for code
- generation disable strict mode in the calls to get_range_strlen
- below.  */
-  tree lenrng[2] = { NULL_TREE, NULL_TREE };
+ Since the length is only used for warning and not for code generation
+ disable strict mode in the calls to get_range_strlen below.  */
+  tree maxlen = NULL_TREE;
 
   /* It's safe to call "bounded" string functions with a non-string
  argument since the functions provide an explicit bound for this
@@ -1581,12 +1580,15 @@ maybe_warn_nonstring_arg (tree fndecl, tree exp)
 	   and to adjust the range of the bound of the bounded ones.  */
 	for (unsigned argno = 0;
 	 argno < MIN (nargs, 2)
-	 && !(lenrng[1] && TREE_CODE (lenrng[1]) == INTEGER_CST); argno++)
+	   && !(maxlen && TREE_CODE (maxlen) == INTEGER_CST); argno++)
 	  {
 	tree arg = CALL_EXPR_ARG (exp, argno);
 	if (!get_attr_nonstring_decl (arg))
-	  get_range_strlen (arg, lenrng, /* eltsize = */ 1,
-/* strict = */ false);
+	  {
+		strlen_data_t lendata (/* eltsize = */ 1);
+		get_range_strlen (arg, &lendata);
+		maxlen = lendata.maxsize;
+	  }
 	  }
   }
   /*

Re: [PATCH][IRA,LRA] Fix PR87466, all pseudos live across setjmp are spilled

2018-10-02 Thread Peter Bergner

On 10/1/18 4:25 AM, Eric Botcazou wrote:
>> Since all implementations of this hook will have to do the same, I think
>> it is better if you leave this test at the (only two) callers.  The hook
>> doesn't need an argument then, and maybe is better named something like
>> setjmp_is_normal_call?  (The original code did not test CALL_P btw).
> 
> Seconded, but I'd be even more explicit in the naming of the hook, for 
> example 
> setjmp_preserves_nonvolatile_registers or somesuch.  (And I don't think that 
> setjmp can be considered a normal call in any case since it returns twice).

Ok, here is an updated patch that renames the hook using Eric's suggestion
and keeps the scanning of the SETJMP note in the callers of the hook like
Segher wanted.

This is currently bootstrapping right now, but ok for trunk assuming no
regressions?

Peter

gcc/
PR rtl-optimization/87466
* target.def (setjmp_preserves_nonvolatile_regs_p): New target hook.
* doc/tm.texi.in (TARGET_SETJMP_PRESERVES_NONVOLATILE_REGS_P): New hook.
* doc/tm.texi: Regenerate.
* targhooks.c (default_setjmp_preserves_nonvolatile_regs_p): Declare.
* targhooks.h (default_setjmp_preserves_nonvolatile_regs_p): New
function.
* ira-lives.c (process_bb_node_lives): Use the new target hook.
* lra-lives.c (process_bb_lives): Likewise.
* config/rs6000/rs6000.c (TARGET_SETJMP_PRESERVES_NONVOLATILE_REGS_P):
Define.
(rs6000_setjmp_preserves_nonvolatile_regs_p): New function.

gcc/testsuite/
PR rtl-optimization/87466
* gcc.target/powerpc/pr87466.c: New test.

Index: gcc/target.def
===
--- gcc/target.def  (revision 264795)
+++ gcc/target.def  (working copy)
@@ -3123,6 +3123,21 @@ In order to enforce the representation o
  int, (scalar_int_mode mode, scalar_int_mode rep_mode),
  default_mode_rep_extended)
 
+ DEFHOOK
+(setjmp_preserves_nonvolatile_regs_p,
+ "On some targets, it is assumed that the compiler will spill all pseudos\n\
+  that are live across a call to @code{setjmp}, while other targets treat\n\
+  @code{setjmp} calls as normal function calls.\n\
+  \n\
+  This hook returns false if @code{setjmp} calls do not preserve all\n\
+  non-volatile registers so that gcc that must spill all pseudos that are\n\
+  live across @code{setjmp} calls.  Define this to return true if the\n\
+  target does not need to spill all pseudos live across @code{setjmp} calls.\n\
+  The default implementation conservatively assumes all pseudos must be\n\
+  spilled across @code{setjmp} calls.",
+ bool, (void),
+ default_setjmp_preserves_nonvolatile_regs_p)
+
 /* True if MODE is valid for a pointer in __attribute__((mode("MODE"))).  */
 DEFHOOK
 (valid_pointer_mode,
Index: gcc/doc/tm.texi.in
===
--- gcc/doc/tm.texi.in  (revision 264795)
+++ gcc/doc/tm.texi.in  (working copy)
@@ -7509,6 +7509,8 @@ You need not define this macro if it wou
 
 @hook TARGET_MODE_REP_EXTENDED
 
+@hook TARGET_SETJMP_PRESERVES_NONVOLATILE_REGS_P
+
 @defmac STORE_FLAG_VALUE
 A C expression describing the value returned by a comparison operator
 with an integral mode and stored by a store-flag instruction
Index: gcc/doc/tm.texi
===
--- gcc/doc/tm.texi (revision 264795)
+++ gcc/doc/tm.texi (working copy)
@@ -11008,6 +11008,19 @@ In order to enforce the representation o
 @code{mode}.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_SETJMP_PRESERVES_NONVOLATILE_REGS_P (void)
+On some targets, it is assumed that the compiler will spill all pseudos
+  that are live across a call to @code{setjmp}, while other targets treat
+  @code{setjmp} calls as normal function calls.
+  
+  This hook returns false if @code{setjmp} calls do not preserve all
+  non-volatile registers so that gcc that must spill all pseudos that are
+  live across @code{setjmp} calls.  Define this to return true if the
+  target does not need to spill all pseudos live across @code{setjmp} calls.
+  The default implementation conservatively assumes all pseudos must be
+  spilled across @code{setjmp} calls.
+@end deftypefn
+
 @defmac STORE_FLAG_VALUE
 A C expression describing the value returned by a comparison operator
 with an integral mode and stored by a store-flag instruction
Index: gcc/targhooks.c
===
--- gcc/targhooks.c (revision 264795)
+++ gcc/targhooks.c (working copy)
@@ -209,6 +209,14 @@ default_builtin_setjmp_frame_value (void
   return virtual_stack_vars_rtx;
 }
 
+/* The default implementation of TARGET_SETJMP_PRESERVES_NONVOLATILE_REGS_P.  
*/
+
+bool
+default_setjmp_preserves_nonvolatile_regs_p (void)
+{
+  return false;
+}
+
 /* Generic hook that takes a CUMULATIVE_ARGS pointer and returns false.  */
 
 bool
Index: gcc/targhooks.h

Re: libgo patch committed: Update to 1.11 release

2018-10-02 Thread Ian Lance Taylor

On Fri, Sep 28, 2018 at 7:22 AM, Rainer Orth
 wrote:
>
>> I've committed a patch to update libgo to the 1.11 release.  As usual
>> for these updates, the patch is too large to attach to this e-mail
>> message.  I've attached some of the more relevant directories.  This
>> update required some minor patches to the gotools directory and the Go
>> testsuite, also included here.  Bootstrapped and ran Go testsuite on
>> x86_64-pc-linux-gnu.  Committed to mainline.
>
> I just found another issue: unlike Solaris 11, Solaris 10 lacks memmem,
> breaking the build:
>
> /vol/gcc/src/hg/trunk/local/libgo/go/internal/bytealg/bytealg.c: In function 
> 'Index':
> /vol/gcc/src/hg/trunk/local/libgo/go/internal/bytealg/bytealg.c:96:6: error: 
> implicit declaration of function 'memmem'; did you mean 'memset'? 
> [-Werror=implicit-function-declaration]
> 96 |  p = memmem(a.__values, a.__count, b.__values, b.__count);
>|  ^~
>|  memset
> /vol/gcc/src/hg/trunk/local/libgo/go/internal/bytealg/bytealg.c:96:4: error: 
> assignment to 'const byte *' {aka 'const unsigned char *'} from 'int' makes 
> pointer from integer without a cast [-Werror=int-conversion]
> 96 |  p = memmem(a.__values, a.__count, b.__values, b.__count);
>|^
> /vol/gcc/src/hg/trunk/local/libgo/go/internal/bytealg/bytealg.c: In function 
> 'IndexString':
> /vol/gcc/src/hg/trunk/local/libgo/go/internal/bytealg/bytealg.c:111:4: error: 
> assignment to 'const byte *' {aka 'const unsigned char *'} from 'int' makes 
> pointer from integer without a cast [-Werror=int-conversion]
> 111 |  p = memmem(a.str, a.len, b.str, b.len);
> |^

Thanks for the note.  This patch should fix the problem.  Bootstrapped
and ran Go tests on x86_64-pc-linux-gnu, both normally and pretending
that the system had no memmem.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 264793)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-098e36f4ddfcf50aeb34509b5f25b86d7050749c
+bde5ac90e0b4efdf3e9a4d72af4eb23250608611
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/configure.ac
===
--- libgo/configure.ac  (revision 264772)
+++ libgo/configure.ac  (working copy)
@@ -544,7 +544,7 @@ AC_CHECK_HEADERS([linux/filter.h linux/i
 
 AM_CONDITIONAL(HAVE_SYS_MMAN_H, test "$ac_cv_header_sys_mman_h" = yes)
 
-AC_CHECK_FUNCS(strerror_r strsignal wait4 mincore setenv unsetenv 
dl_iterate_phdr)
+AC_CHECK_FUNCS(strerror_r strsignal wait4 mincore setenv unsetenv 
dl_iterate_phdr memmem)
 AM_CONDITIONAL(HAVE_STRERROR_R, test "$ac_cv_func_strerror_r" = yes)
 AM_CONDITIONAL(HAVE_WAIT4, test "$ac_cv_func_wait4" = yes)
 
Index: libgo/go/internal/bytealg/bytealg.c
===
--- libgo/go/internal/bytealg/bytealg.c (revision 264648)
+++ libgo/go/internal/bytealg/bytealg.c (working copy)
@@ -10,6 +10,33 @@
 #include "runtime.h"
 #include "array.h"
 
+#ifndef HAVE_MEMMEM
+
+#define memmem goMemmem
+
+static const void *goMemmem(const void *in, size_t inl, const void *s, size_t 
sl) {
+   const char *p;
+   char first;
+   const char *stop;
+
+   if (sl == 0) {
+   return in;
+   }
+   if (inl < sl) {
+   return nil;
+   }
+   first = *(const char *)(s);
+   stop = (const char *)(in) + (inl - sl);
+   for (p = (const char *)(in); p <= stop; p++) {
+   if (*p == first && __builtin_memcmp(p + 1, (const char *)(s) + 
1, sl - 1) == 0) {
+   return (const void *)(p);
+   }
+   }
+   return nil;
+}
+
+#endif
+
 intgo Compare(struct __go_open_array, struct __go_open_array)
   __asm__(GOSYM_PREFIX "internal_bytealg.Compare")
   __attribute__((no_split_stack));

Re: [PATCH 0/2][IRA,LRA] Fix PR86939, IRA incorrectly creates an interference between a pseudo register and a hard register

2018-10-02 Thread H.J. Lu

On Tue, Oct 2, 2018 at 8:39 AM Peter Bergner  wrote:
>
> On 10/2/18 10:32 AM, H.J. Lu wrote:
> > On Tue, Oct 2, 2018 at 7:59 AM Peter Bergner  wrote:
> >> I'm currently performing bootstrap and regtesting on powerpc64le-linux and
> >> x86_64-linux.  H.J., could you please test this patch on i686 to verify it
> >> doesn't expose any other problems there?  Otherwise, I'll take Jeff's
> >
> > I am waiting for the result of your previous patch.  I will test this one 
> > after
> > that.
>
> Great, thanks!

Your previous patch is OK.  I am testing the new patch now.

-- 
H.J.

Re: GCC options for kernel live-patching (Was: Add a new option to control inlining only on static functions)

2018-10-02 Thread Martin Liška


On 10/2/18 5:12 PM, Qing Zhao wrote:



On Oct 2, 2018, at 9:55 AM, Martin Liška mailto:mli...@suse.cz>> wrote:


Affected functions: 5
  __ilog2_u64/132 (include/linux/log2.h:40:5)
  ablkcipher_request_alloc/1639 (include/linux/crypto.h:979:82)
  ablkcipher_request_alloc.constprop.8/3198 (include/linux/crypto.h:979:82)
  helper_rfc4106_decrypt/3007 (arch/x86/crypto/aesni-intel_glue.c:1016:12)
  helper_rfc4106_encrypt/3006 (arch/x86/crypto/aesni-intel_glue.c:939:12)
[..skipped..]


if we want to patch the function “fls64/63”,  what else functions we need to 
patch, too? my guess is:


Hi.

Yes, 'Affected functions' is exactly the list of functions you want to patch.



**all the callers:
__ilog2_u64/132
ablkcipher_request_alloc/1639
helper_rfc4106_decrypt/3007
helper_rfc4106_encrypt/3006
**and:
ablkcipher_request_alloc.constprop.8/3198
is the above correct?

how to generate patch for ablkcipher_request_alloc.constprop.8/3198? since it’s 
not a function in the source code?


Well, it's a 'static inline' function in a header file thus the function will 
be inlined in all usages.
In this particular case there's no such caller function, so you're fine.


So, for cloned functions, you have to analyze them case by case manually to see 
their callers?


No, the tool should provide complete list of affected functions.


So,  the tool will provide the callers of the cloned routine?


No, the tool does not list callers of affected functions. But function call ABI 
is reasonable
live-patching boundary.

then we will patch the callers of the cloned routine, Not the cloned routine 
itself?





why not just disable ipa-cp or ipa-sra completely?


Because the optimizations create function clones, which are trackable with my 
tool
and one knows then all affected functions.

Okay. I see.


You can disable the optimizations, but you'll miss some performance benefit 
provide
by compiler.

Note that as Martin Jambor mentioned in point 2) there are also IPA 
optimizations that
do not create clones. These should be listed and eventually disabled for kernel 
live
patching.


Yes, such IPA analyzes should be disabled.  we need to identify a complete list 
of such analyzes.


That was promised to be done by Honza Hubička. He's very skilled in IPA 
optimizations and he's aware
of optimizations that cause troubles for live-patching.

Martin



thanks.

Qing

Re: [PATCH] Properly mark lambdas in GCOV (PR gcov-profile/86109).

2018-10-02 Thread Martin Liška


On 10/2/18 5:32 PM, Jeff Law wrote:

On 9/12/18 6:39 AM, Martin Liška wrote:

Hi.

This is follow-up of:
https://gcc.gnu.org/ml/gcc/2018-08/msg7.html

I've chosen to implement that with new DECL_CXX_LAMBDA_FUNCTION that
uses an empty bit in tree_function_decl.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready for trunk?

gcc/ChangeLog:

2018-09-12  Martin Liska  

PR gcov-profile/86109
* coverage.c (coverage_begin_function): Do not
mark lambdas as artificial.
* tree-core.h (struct GTY): Remove tm_clone_flag
and introduce new lambda_function.
* tree.h (DECL_CXX_LAMBDA_FUNCTION): New macro.

gcc/cp/ChangeLog:

2018-09-12  Martin Liska  

PR gcov-profile/86109
* parser.c (cp_parser_lambda_declarator_opt):
Set DECL_CXX_LAMBDA_FUNCTION for lambdas.

gcc/testsuite/ChangeLog:

2018-09-12  Martin Liska  

PR gcov-profile/86109
* g++.dg/gcov/pr86109.C: New test.


Hi.

Thanks for the review.


So the concern here is C++-isms bleeding into the language independent
nodes.  I think a name change from DECL_CXX_LAMBDA_FUNCTION to something
else would be enough to go forward.


Agree, well, then I would suggest to use DECL_LAMBDA_FUNCTION. The concept
of lambda functions is quite common in other programming languages.

Martin



jeff

Re: [PATCH] Properly mark lambdas in GCOV (PR gcov-profile/86109).

2018-10-02 Thread Jeff Law

On 10/2/18 11:14 AM, Martin Liška wrote:
> On 10/2/18 5:32 PM, Jeff Law wrote:
>> On 9/12/18 6:39 AM, Martin Liška wrote:
>>> Hi.
>>>
>>> This is follow-up of:
>>> https://gcc.gnu.org/ml/gcc/2018-08/msg7.html
>>>
>>> I've chosen to implement that with new DECL_CXX_LAMBDA_FUNCTION that
>>> uses an empty bit in tree_function_decl.
>>>
>>> Patch can bootstrap on ppc64le-redhat-linux and survives regression
>>> tests.
>>>
>>> Ready for trunk?
>>>
>>> gcc/ChangeLog:
>>>
>>> 2018-09-12  Martin Liska  
>>>
>>> PR gcov-profile/86109
>>> * coverage.c (coverage_begin_function): Do not
>>> mark lambdas as artificial.
>>> * tree-core.h (struct GTY): Remove tm_clone_flag
>>> and introduce new lambda_function.
>>> * tree.h (DECL_CXX_LAMBDA_FUNCTION): New macro.
>>>
>>> gcc/cp/ChangeLog:
>>>
>>> 2018-09-12  Martin Liska  
>>>
>>> PR gcov-profile/86109
>>> * parser.c (cp_parser_lambda_declarator_opt):
>>> Set DECL_CXX_LAMBDA_FUNCTION for lambdas.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2018-09-12  Martin Liska  
>>>
>>> PR gcov-profile/86109
>>> * g++.dg/gcov/pr86109.C: New test.
> 
> Hi.
> 
> Thanks for the review.
> 
>> So the concern here is C++-isms bleeding into the language independent
>> nodes.  I think a name change from DECL_CXX_LAMBDA_FUNCTION to something
>> else would be enough to go forward.
> 
> Agree, well, then I would suggest to use DECL_LAMBDA_FUNCTION. The concept
> of lambda functions is quite common in other programming languages.
Agreed and OK with that change.

jeff

Re: [PATCH][IRA,LRA] Fix PR87466, all pseudos live across setjmp are spilled

2018-10-02 Thread Peter Bergner

On 10/2/18 11:42 AM, Peter Bergner wrote:
> On 10/1/18 4:25 AM, Eric Botcazou wrote:
> This is currently bootstrapping right now, but ok for trunk assuming no
> regressions?
> 
> gcc/
>   PR rtl-optimization/87466
>   * target.def (setjmp_preserves_nonvolatile_regs_p): New target hook.
>   * doc/tm.texi.in (TARGET_SETJMP_PRESERVES_NONVOLATILE_REGS_P): New hook.
>   * doc/tm.texi: Regenerate.
>   * targhooks.c (default_setjmp_preserves_nonvolatile_regs_p): Declare.
>   * targhooks.h (default_setjmp_preserves_nonvolatile_regs_p): New
>   function.
>   * ira-lives.c (process_bb_node_lives): Use the new target hook.
>   * lra-lives.c (process_bb_lives): Likewise.
>   * config/rs6000/rs6000.c (TARGET_SETJMP_PRESERVES_NONVOLATILE_REGS_P):
>   Define.
>   (rs6000_setjmp_preserves_nonvolatile_regs_p): New function.
> 
> gcc/testsuite/
>   PR rtl-optimization/87466
>   * gcc.target/powerpc/pr87466.c: New test.

My powerpc64le-linux bootstrap and regtesting showed no regressions.

Peter

Re: GCC options for kernel live-patching (Was: Add a new option to control inlining only on static functions)

2018-10-02 Thread Richard Biener

On October 2, 2018 7:13:13 PM GMT+02:00, "Martin Liška"  wrote:
>On 10/2/18 5:12 PM, Qing Zhao wrote:
>> 
>>> On Oct 2, 2018, at 9:55 AM, Martin Liška > wrote:
>>
>> Affected functions: 5
>>   __ilog2_u64/132 (include/linux/log2.h:40:5)
>>   ablkcipher_request_alloc/1639 (include/linux/crypto.h:979:82)
>>   ablkcipher_request_alloc.constprop.8/3198
>(include/linux/crypto.h:979:82)
>>   helper_rfc4106_decrypt/3007
>(arch/x86/crypto/aesni-intel_glue.c:1016:12)
>>   helper_rfc4106_encrypt/3006
>(arch/x86/crypto/aesni-intel_glue.c:939:12)
>> [..skipped..]
>>
>>
>> if we want to patch the function “fls64/63”,  what else functions
>we need to patch, too? my guess is:
>
> Hi.
>
> Yes, 'Affected functions' is exactly the list of functions you
>want to patch.
>
>>
>> **all the callers:
>> __ilog2_u64/132
>> ablkcipher_request_alloc/1639
>> helper_rfc4106_decrypt/3007
>> helper_rfc4106_encrypt/3006
>> **and:
>> ablkcipher_request_alloc.constprop.8/3198
>> is the above correct?
>>
>> how to generate patch for
>ablkcipher_request_alloc.constprop.8/3198? since it’s not a function in
>the source code?
>
> Well, it's a 'static inline' function in a header file thus the
>function will be inlined in all usages.
> In this particular case there's no such caller function, so you're
>fine.

 So, for cloned functions, you have to analyze them case by case
>manually to see their callers?
>>>
>>> No, the tool should provide complete list of affected functions.
>> 
>> So,  the tool will provide the callers of the cloned routine?
>
>No, the tool does not list callers of affected functions. But function
>call ABI is reasonable
>live-patching boundary.
>
>then we will patch the callers of the cloned routine, Not the cloned
>routine itself?
>> 
>>>
 why not just disable ipa-cp or ipa-sra completely?
>>>
>>> Because the optimizations create function clones, which are
>trackable with my tool
>>> and one knows then all affected functions.
>> Okay. I see.
>>>
>>> You can disable the optimizations, but you'll miss some performance
>benefit provide
>>> by compiler.
>>>
>>> Note that as Martin Jambor mentioned in point 2) there are also IPA
>optimizations that
>>> do not create clones. These should be listed and eventually disabled
>for kernel live
>>> patching.
>> 
>> Yes, such IPA analyzes should be disabled.  we need to identify a
>complete list of such analyzes.
>
>That was promised to be done by Honza Hubička. He's very skilled in IPA
>optimizations and he's aware
>of optimizations that cause troubles for live-patching.

One could also compute the list of possibly affected functions from such 
semantic changes. 

Richard. 

>Martin
>
>> 
>> thanks.
>> 
>> Qing
>>

Re: [PATCH][IRA,LRA] Fix PR87466, all pseudos live across setjmp are spilled

2018-10-02 Thread Segher Boessenkool

Hi Peter,

On Tue, Oct 02, 2018 at 11:42:19AM -0500, Peter Bergner wrote:
> +/* The default implementation of TARGET_SETJMP_PRESERVES_NONVOLATILE_REGS_P. 
>  */
> +
> +bool
> +default_setjmp_preserves_nonvolatile_regs_p (void)
> +{
> +  return false;
> +}

You can just use hook_bool_void_false for this (and hook_bool_void_true
for e.g. the rs6000 implementation).

Segher

Re: [PATCH][IRA,LRA] Fix PR87466, all pseudos live across setjmp are spilled

2018-10-02 Thread Peter Bergner

On 10/2/18 1:21 PM, Segher Boessenkool wrote:
> Hi Peter,
> 
> On Tue, Oct 02, 2018 at 11:42:19AM -0500, Peter Bergner wrote:
>> +/* The default implementation of 
>> TARGET_SETJMP_PRESERVES_NONVOLATILE_REGS_P.  */
>> +
>> +bool
>> +default_setjmp_preserves_nonvolatile_regs_p (void)
>> +{
>> +  return false;
>> +}
> 
> You can just use hook_bool_void_false for this (and hook_bool_void_true
> for e.g. the rs6000 implementation).

Ah, I didn't know those existed.  Ok, I'll rework it to use that, thanks.

Peter

Re: [PATCH][IRA,LRA] Fix PR87466, all pseudos live across setjmp are spilled

2018-10-02 Thread Peter Bergner

On 10/2/18 1:21 PM, Segher Boessenkool wrote:
> On Tue, Oct 02, 2018 at 11:42:19AM -0500, Peter Bergner wrote:
>> +/* The default implementation of 
>> TARGET_SETJMP_PRESERVES_NONVOLATILE_REGS_P.  */
>> +
>> +bool
>> +default_setjmp_preserves_nonvolatile_regs_p (void)
>> +{
>> +  return false;
>> +}
> 
> You can just use hook_bool_void_false for this (and hook_bool_void_true
> for e.g. the rs6000 implementation).

Yes, much nicer and smaller patch using those functions!  So here's version 3,
which is the same as version 2 but using above mentioned hook functions.

This is currently bootstrapping right now, ok now assuming no regressions?

Peter

gcc/
PR rtl-optimization/87466
* target.def (setjmp_preserves_nonvolatile_regs_p): New target hook.
* doc/tm.texi.in (TARGET_SETJMP_PRESERVES_NONVOLATILE_REGS_P): New hook.
* doc/tm.texi: Regenerate.
* ira-lives.c (process_bb_node_lives): Use the new target hook.
* lra-lives.c (process_bb_lives): Likewise.
* config/rs6000/rs6000.c (TARGET_SETJMP_PRESERVES_NONVOLATILE_REGS_P):
Define.

gcc/testsuite/
PR rtl-optimization/87466
* gcc.target/powerpc/pr87466.c: New test.

Index: gcc/target.def
===
--- gcc/target.def  (revision 264795)
+++ gcc/target.def  (working copy)
@@ -3123,6 +3123,21 @@ In order to enforce the representation o
  int, (scalar_int_mode mode, scalar_int_mode rep_mode),
  default_mode_rep_extended)
 
+ DEFHOOK
+(setjmp_preserves_nonvolatile_regs_p,
+ "On some targets, it is assumed that the compiler will spill all pseudos\n\
+  that are live across a call to @code{setjmp}, while other targets treat\n\
+  @code{setjmp} calls as normal function calls.\n\
+  \n\
+  This hook returns false if @code{setjmp} calls do not preserve all\n\
+  non-volatile registers so that gcc that must spill all pseudos that are\n\
+  live across @code{setjmp} calls.  Define this to return true if the\n\
+  target does not need to spill all pseudos live across @code{setjmp} calls.\n\
+  The default implementation conservatively assumes all pseudos must be\n\
+  spilled across @code{setjmp} calls.",
+ bool, (void),
+ hook_bool_void_false)
+
 /* True if MODE is valid for a pointer in __attribute__((mode("MODE"))).  */
 DEFHOOK
 (valid_pointer_mode,
Index: gcc/doc/tm.texi.in
===
--- gcc/doc/tm.texi.in  (revision 264795)
+++ gcc/doc/tm.texi.in  (working copy)
@@ -7509,6 +7509,8 @@ You need not define this macro if it wou
 
 @hook TARGET_MODE_REP_EXTENDED
 
+@hook TARGET_SETJMP_PRESERVES_NONVOLATILE_REGS_P
+
 @defmac STORE_FLAG_VALUE
 A C expression describing the value returned by a comparison operator
 with an integral mode and stored by a store-flag instruction
Index: gcc/doc/tm.texi
===
--- gcc/doc/tm.texi (revision 264795)
+++ gcc/doc/tm.texi (working copy)
@@ -11008,6 +11008,19 @@ In order to enforce the representation o
 @code{mode}.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_SETJMP_PRESERVES_NONVOLATILE_REGS_P (void)
+On some targets, it is assumed that the compiler will spill all pseudos
+  that are live across a call to @code{setjmp}, while other targets treat
+  @code{setjmp} calls as normal function calls.
+  
+  This hook returns false if @code{setjmp} calls do not preserve all
+  non-volatile registers so that gcc that must spill all pseudos that are
+  live across @code{setjmp} calls.  Define this to return true if the
+  target does not need to spill all pseudos live across @code{setjmp} calls.
+  The default implementation conservatively assumes all pseudos must be
+  spilled across @code{setjmp} calls.
+@end deftypefn
+
 @defmac STORE_FLAG_VALUE
 A C expression describing the value returned by a comparison operator
 with an integral mode and stored by a store-flag instruction
Index: gcc/ira-lives.c
===
--- gcc/ira-lives.c (revision 264795)
+++ gcc/ira-lives.c (working copy)
@@ -1207,8 +1207,9 @@ process_bb_node_lives (ira_loop_tree_nod
 call, if this function receives a nonlocal
 goto.  */
  if (cfun->has_nonlocal_label
- || find_reg_note (insn, REG_SETJMP,
-   NULL_RTX) != NULL_RTX)
+ || (!targetm.setjmp_preserves_nonvolatile_regs_p ()
+ && (find_reg_note (insn, REG_SETJMP, NULL_RTX)
+ != NULL_RTX)))
{
  SET_HARD_REG_SET (OBJECT_CONFLICT_HARD_REGS (obj));
  SET_HARD_REG_SET (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj));
Index: gcc/lra-lives.c
===
--- gcc/lra-lives.c (revision 264795)
+++ gcc/lra-l

Re: [PATCH 0/2][IRA,LRA] Fix PR86939, IRA incorrectly creates an interference between a pseudo register and a hard register

2018-10-02 Thread Peter Bergner

On 10/2/18 9:59 AM, Peter Bergner wrote:
> gcc/
>   PR rtl-optimization/86939
>   PR rtl-optimization/87479
>   * ira.h (copy_insn_p): New prototype.
>   * ira-lives.c (ignore_reg_for_conflicts): New static variable.
>   (make_hard_regno_dead): Don't add conflicts for register
>   ignore_reg_for_conflicts.
>   (make_object_dead): Likewise.
>   (copy_insn_p): New function.
>   (process_bb_node_lives): Set ignore_reg_for_conflicts for copies.
>   Remove special conflict handling of REAL_PIC_OFFSET_TABLE_REGNUM.
>   * lra-lives.c (ignore_reg_for_conflicts): New static variable.
>   (make_hard_regno_dead): Don't add conflicts for register
>   ignore_reg_for_conflicts.  Remove special conflict handling of
>   REAL_PIC_OFFSET_TABLE_REGNUM.  Remove now unused argument
>   check_pic_pseudo_p and update callers.
>   (mark_pseudo_dead): Don't add conflicts for register
>   ignore_reg_for_conflicts.
>   (process_bb_lives): Set ignore_reg_for_conflicts for copies.

So bootstrap and regtesting on powerpc64le-linux show no regressions.
Looking at the x86_64-linux results, I see one test suite regression:

  FAIL: gcc.target/i386/pr49095.c scan-assembler-times \\), % 8

I have included the function that is compiled differently with and
without the patch.  Looking at the IRA rtl dumps, there is pseudo
85 that is copied from and into hard reg 5 and we now no longer
report that they conflict.  That allows pesudo 85 to now be assigned
to hard reg 5, rather than hard reg 3 (old code) and that leads
to the code differences shown below.  I don't know x86_64 mnemonics
enough to say whether the code changes below are "better" or "similar"
or ???.  H.J., can you comment on the below code gen changes?
If they're better or similar to the old code, I could just modify
the expected results for pr49095.c.

Peter



[bergner@dagger1 PR87479]$ cat pr49095.i 
void foo (void *);

int *
f1 (int *x)
{
  if (!--*x)
foo (x);
  return x;
}
[bergner@dagger1 PR87479]$ 
/data/bergner/gcc/build/gcc-fsf-mainline-pr87479-base-regtest/gcc/xgcc 
-B/data/bergner/gcc/build/gcc-fsf-mainline-pr87479-base-regtest/gcc/ -Os 
-fno-shrink-wrap -masm=att -ffat-lto-objects -fno-ident -S -o pr49095-base.s 
pr49095.i 
[bergner@dagger1 PR87479]$ 
/data/bergner/gcc/build/gcc-fsf-mainline-pr87479-regtest/gcc/xgcc 
-B/data/bergner/gcc/build/gcc-fsf-mainline-pr87479-regtest/gcc/ -Os 
-fno-shrink-wrap -masm=att -ffat-lto-objects -fno-ident -S -o pr49095-new.s 
pr49095.i 
[bergner@dagger1 PR87479]$ diff -u pr49095-base.s pr49095-new.s 
--- pr49095-base.s  2018-10-02 14:07:09.0 -0500
+++ pr49095-new.s   2018-10-02 14:07:40.0 -0500
@@ -5,16 +5,16 @@
 f1:
 .LFB0:
.cfi_startproc
+   subq$24, %rsp
+   .cfi_def_cfa_offset 32
decl(%rdi)
-   pushq   %rbx
-   .cfi_def_cfa_offset 16
-   .cfi_offset 3, -16
-   movq%rdi, %rbx
jne .L2
+   movq%rdi, 8(%rsp)
callfoo
+   movq8(%rsp), %rdi
 .L2:
-   movq%rbx, %rax
-   popq%rbx
+   movq%rdi, %rax
+   addq$24, %rsp
.cfi_def_cfa_offset 8
ret
.cfi_endproc

Re: C++ PATCH to implement C++20 P0892R2 - explicit(bool)

2018-10-02 Thread Marek Polacek

On Mon, Oct 01, 2018 at 07:47:10PM -0400, Jason Merrill wrote:
> On Mon, Oct 1, 2018 at 6:41 PM Marek Polacek  wrote:
> >
> > This patch implements C++20 explicit(bool), as described in:
> > .
> >
> > I tried to follow the noexcept specifier implementation where I could, which
> > made the non-template parts of this fairly easy.  To make explicit(expr) 
> > work
> > with dependent expressions, I had to add DECL_EXPLICIT_SPEC to lang_decl_fn,
> > which serves as a vessel to get the explicit-specifier to 
> > tsubst_function_decl
> > where I substitute the dependent arguments.
> 
> What's the impact of that on memory consumption?  I'm nervous about
> adding another word to most functions when it's not useful to most of
> them.  For several similar things we've been using hash tables on the
> side.

Yeah, that is a fair concern.  I'm not sure if I know of a good way to measure
it.  I took wide-int.ii and ran /usr/bin/time -v ./cc1plus; then it's roughly
Maximum resident set size (kbytes): 95020
vs. 
Maximum resident set size (kbytes): 95272
which doesn't seem too bad but I don't know if it proves anything.

If we went with the hash table, would it work like this?
1) have a hash table mapping decls (key) to explicit-specifiers
2) instead of setting DECL_EXPLICIT_SPEC put the parsed explicit-specifier
   into the table
3) in tsubst_function_decl look if the fn decl is associated with any
   explicit-specifier, if it is, substitute it, and set DECL_NONCONVERTING_P
   accordingly.

> > +/* Create a representation of the explicit-specifier with
> > +   constant-expression of EXPR.  COMPLAIN is as for tsubst.  */
> > +
> > +tree
> > +build_explicit_specifier (tree expr, tsubst_flags_t complain)
> > +{
> > +  if (processing_template_decl && value_dependent_expression_p (expr))
> > +/* Wait for instantiation.  tsubst_function_decl will take care of it. 
> >  */
> > +return expr;
> > +
> > +  expr = perform_implicit_conversion_flags (boolean_type_node, expr,
> > +   complain, LOOKUP_NORMAL);
> > +  expr = instantiate_non_dependent_expr (expr);
> > +  expr = cxx_constant_value (expr);
> > +  return expr;
> > +}
> 
> Is there a reason not to use build_converted_constant_expr?

build_converted_constant_expr doesn't allow narrowing conversions but we should
allow them in explicit-specifier which takes "contextually converted constant
expression of type bool", much like in

constexpr int foo () { return 42; }
static_assert (foo());

Right?

Marek

Re: [PATCH 0/2][IRA,LRA] Fix PR86939, IRA incorrectly creates an interference between a pseudo register and a hard register

2018-10-02 Thread H.J. Lu

On Tue, Oct 2, 2018 at 12:44 PM Peter Bergner  wrote:
>
> On 10/2/18 9:59 AM, Peter Bergner wrote:
> > gcc/
> >   PR rtl-optimization/86939
> >   PR rtl-optimization/87479
> >   * ira.h (copy_insn_p): New prototype.
> >   * ira-lives.c (ignore_reg_for_conflicts): New static variable.
> >   (make_hard_regno_dead): Don't add conflicts for register
> >   ignore_reg_for_conflicts.
> >   (make_object_dead): Likewise.
> >   (copy_insn_p): New function.
> >   (process_bb_node_lives): Set ignore_reg_for_conflicts for copies.
> >   Remove special conflict handling of REAL_PIC_OFFSET_TABLE_REGNUM.
> >   * lra-lives.c (ignore_reg_for_conflicts): New static variable.
> >   (make_hard_regno_dead): Don't add conflicts for register
> >   ignore_reg_for_conflicts.  Remove special conflict handling of
> >   REAL_PIC_OFFSET_TABLE_REGNUM.  Remove now unused argument
> >   check_pic_pseudo_p and update callers.
> >   (mark_pseudo_dead): Don't add conflicts for register
> >   ignore_reg_for_conflicts.
> >   (process_bb_lives): Set ignore_reg_for_conflicts for copies.
>
> So bootstrap and regtesting on powerpc64le-linux show no regressions.
> Looking at the x86_64-linux results, I see one test suite regression:
>
>   FAIL: gcc.target/i386/pr49095.c scan-assembler-times \\), % 8
>
> I have included the function that is compiled differently with and
> without the patch.  Looking at the IRA rtl dumps, there is pseudo
> 85 that is copied from and into hard reg 5 and we now no longer
> report that they conflict.  That allows pesudo 85 to now be assigned
> to hard reg 5, rather than hard reg 3 (old code) and that leads
> to the code differences shown below.  I don't know x86_64 mnemonics
> enough to say whether the code changes below are "better" or "similar"
> or ???.  H.J., can you comment on the below code gen changes?
> If they're better or similar to the old code, I could just modify
> the expected results for pr49095.c.
>
> Peter
>
>
>
> [bergner@dagger1 PR87479]$ cat pr49095.i
> void foo (void *);
>
> int *
> f1 (int *x)
> {
>   if (!--*x)
> foo (x);
>   return x;
> }
> [bergner@dagger1 PR87479]$ 
> /data/bergner/gcc/build/gcc-fsf-mainline-pr87479-base-regtest/gcc/xgcc 
> -B/data/bergner/gcc/build/gcc-fsf-mainline-pr87479-base-regtest/gcc/ -Os 
> -fno-shrink-wrap -masm=att -ffat-lto-objects -fno-ident -S -o pr49095-base.s 
> pr49095.i
> [bergner@dagger1 PR87479]$ 
> /data/bergner/gcc/build/gcc-fsf-mainline-pr87479-regtest/gcc/xgcc 
> -B/data/bergner/gcc/build/gcc-fsf-mainline-pr87479-regtest/gcc/ -Os 
> -fno-shrink-wrap -masm=att -ffat-lto-objects -fno-ident -S -o pr49095-new.s 
> pr49095.i
> [bergner@dagger1 PR87479]$ diff -u pr49095-base.s pr49095-new.s
> --- pr49095-base.s  2018-10-02 14:07:09.0 -0500
> +++ pr49095-new.s   2018-10-02 14:07:40.0 -0500
> @@ -5,16 +5,16 @@
>  f1:
>  .LFB0:
> .cfi_startproc
> +   subq$24, %rsp
> +   .cfi_def_cfa_offset 32
> decl(%rdi)
> -   pushq   %rbx
> -   .cfi_def_cfa_offset 16
> -   .cfi_offset 3, -16
> -   movq%rdi, %rbx
> jne .L2
> +   movq%rdi, 8(%rsp)
> callfoo
> +   movq8(%rsp), %rdi
>  .L2:
> -   movq%rbx, %rax
> -   popq%rbx
> +   movq%rdi, %rax
> +   addq$24, %rsp
> .cfi_def_cfa_offset 8
> ret
> .cfi_endproc
>

I saw the same failures:

FAIL: gcc.target/i386/pr49095.c scan-assembler-times \\), % 8
FAIL: gcc.target/i386/pr49095.c scan-assembler-times \\), % 8

I think the new ones are better, especially in 32-bit case:

Old:

[hjl@gnu-cfl-1 gcc]$ ./xgcc -B./
/export/gnu/import/git/sources/gcc/gcc/testsuite/gcc.target/i386/pr49095.c
-m32 -fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
-fdiagnostics-color=never -Os -fno-shrink-wrap -masm=att -mregparm=2
-ffat-lto-objects -fno-ident -S -o pr49095.s
[hjl@gnu-cfl-1 gcc]$ wc -l pr49095.s
2314 pr49095.s
[hjl@gnu-cfl-1 gcc]$

New:

[hjl@gnu-skl-1 gcc]$ ./xgcc -B./
/export/gnu/import/git/sources/gcc/gcc/testsuite/gcc.target/i386/pr49095.c
-m32 -fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
-fdiagnostics-color=never -Os -fno-shrink-wrap -masm=att -mregparm=2
-ffat-lto-objects -fno-ident -S -o pr49095.s
[hjl@gnu-skl-1 gcc]$ wc -l pr49095.s
2163 pr49095.s
[hjl@gnu-skl-1 gcc]$

-- 
H.J.

[OpenACC] initial manual deep copy in c

2018-10-02 Thread Cesar Philippidis

I've push the attach patch to my github trunk-acc-mdc branch which
enables OpenMP 4.5 deep copy semantics in OpenACC data clauses in C. Now
GCC accepts data clauses of the form

  #pragma acc data copy(v.a[:n], v.b)

I think there are a couple of limitations in OpenMP that's going to
force me to introduce a new GOMP_MAP_ACC_STRUCT map kind. Basically,
GOMP_MAP_STRUCT reserves the minimum amount of device storage to the
member actually used in a struct. OpenACC allows the users to
dynamically attach and detach struct members, so GOMP_MAP_ACC_STRUCT
would need reserve enough memory for the entire struct. This is also
necessary for cases like this

  struct {
int *a, b, *c;
  } v;

  #pragma acc data copy(v.b)
  {
#pragma acc parallel copy(v.a[:n], v.c[:n])
  }

If the acc data directive is replaced with omp target data, and the acc
parallel replaced with omp target something, then the runtime would
crash because struct v has been partially mapped already.

Going forward, OpenACC 2.6 requires the runtime to maintain an
attachment counter to keep track if struct fields have been mapped. So
that's another justification for the GOMP_MAP_ACC_STRUCT type.

This is all an early work in progress. I'm still experimenting with some
other functionality. If you checkout that branch, beware it may be rebased.

Cesar
[OpenACC] Initial Manual Deep Copy

2018-10-02  Cesar Philippidis  

	gcc/c/
	* c-typeck.c (handle_omp_array_sections_1): Enable structs in acc
	data clauses.
	(c_finish_omp_clauses): Likewise.

	libgomp/
	* libgomp.h: Declare gomp_map_val.
	* oacc-parallel.c (GOACC_parallel_keyed): Use it to set devaddrs.
	* target.c (gomp_map_val): Remove static inline.
	* testsuite/libgomp.oacc-c-c++-common/deep-copy-1.c: New test.


diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 9d09b8d65fd..0428f48952a 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -12605,7 +12605,6 @@ handle_omp_array_sections_1 (tree c, tree t, vec &types,
 	  return error_mark_node;
 	}
   if (TREE_CODE (t) == COMPONENT_REF
-	  && ort == C_ORT_OMP
 	  && (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
 	  || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_TO
 	  || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_FROM))
@@ -13799,7 +13798,6 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 	  break;
 	}
 	  if (TREE_CODE (t) == COMPONENT_REF
-	  && (ort & C_ORT_OMP)
 	  && OMP_CLAUSE_CODE (c) != OMP_CLAUSE__CACHE_)
 	{
 	  if (DECL_BIT_FIELD (TREE_OPERAND (t, 1)))
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 3a8cc2bd7d6..553d1bb81ba 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -996,6 +996,7 @@ extern void gomp_acc_insert_pointer (size_t, void **, size_t *, void *);
 extern void gomp_acc_remove_pointer (void *, size_t, bool, int, int, int);
 extern void gomp_acc_declare_allocate (bool, size_t, void **, size_t *,
    unsigned short *);
+extern uintptr_t gomp_map_val (struct target_mem_desc *, void **, size_t);
 
 extern struct target_mem_desc *gomp_map_vars (struct gomp_device_descr *,
 	  size_t, void **, void **,
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index b80ace58590..fd5bbfbdf7d 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -231,8 +231,7 @@ GOACC_parallel_keyed (int device, void (*fn) (void *),
 
   devaddrs = gomp_alloca (sizeof (void *) * mapnum);
   for (i = 0; i < mapnum; i++)
-devaddrs[i] = (void *) (tgt->list[i].key->tgt->tgt_start
-			+ tgt->list[i].key->tgt_offset);
+devaddrs[i] = (void *) gomp_map_val (tgt, hostaddrs, i);
 
   acc_dev->openacc.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs,
 			  async, dims, tgt);
diff --git a/libgomp/target.c b/libgomp/target.c
index dda041cdbef..a87ba7cad0e 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -457,7 +457,7 @@ gomp_map_fields_existing (struct target_mem_desc *tgt, splay_tree_key n,
 	  (void *) cur_node.host_end);
 }
 
-static inline uintptr_t
+uintptr_t
 gomp_map_val (struct target_mem_desc *tgt, void **hostaddrs, size_t i)
 {
   if (tgt->list[i].key != NULL)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-1.c
new file mode 100644
index 000..d489cc645cd
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-1.c
@@ -0,0 +1,25 @@
+#include 
+#include 
+
+struct dc
+{
+  int a;
+  int *b;
+};
+
+int
+main ()
+{
+  int n = 100, i;
+  struct dc v = { .a = 3, .b = (int *) malloc (sizeof (int) * n) };
+
+#pragma omp target teams distribute parallel for map(tofrom:v.a, v.b[:n])
+#pragma acc parallel loop copy(v.a, v.b[:n])
+  for (i = 0; i < n; i++)
+v.b[i] = v.a;
+
+  for (i = 0; i < 10; i++)
+printf ("%d: %d\n", i, v.b[i]);
+
+  return 0;
+}

Re: [PATCH 0/2][IRA,LRA] Fix PR86939, IRA incorrectly creates an interference between a pseudo register and a hard register

2018-10-02 Thread Peter Bergner

On 10/2/18 4:52 PM, H.J. Lu wrote:
> I saw the same failures:
> 
> FAIL: gcc.target/i386/pr49095.c scan-assembler-times \\), % 8
> FAIL: gcc.target/i386/pr49095.c scan-assembler-times \\), % 8
> 
> I think the new ones are better, especially in 32-bit case:

Excellent!  Does the following test case patch make it so that
it PASSes again?

Peter


Index: gcc/testsuite/gcc.target/i386/pr49095.c
===
--- gcc/testsuite/gcc.target/i386/pr49095.c (revision 264793)
+++ gcc/testsuite/gcc.target/i386/pr49095.c (working copy)
@@ -73,4 +73,5 @@ G (long)
 /* { dg-final { scan-assembler-not "test\[lq\]" } } */
 /* The {f,h}{char,short,int,long}xor functions aren't optimized into
a RMW instruction, so need load, modify and store.  FIXME eventually.  */
-/* { dg-final { scan-assembler-times "\\), %" 8 } } */
+/* { dg-final { scan-assembler-times "\\), %" 57 { target { ia32 } } } } */
+/* { dg-final { scan-assembler-times "\\), %" 45 { target { lp64 } } } } */

Re: [PATCH 0/2][IRA,LRA] Fix PR86939, IRA incorrectly creates an interference between a pseudo register and a hard register

2018-10-02 Thread H.J. Lu

On Tue, Oct 2, 2018 at 3:28 PM Peter Bergner  wrote:
>
> On 10/2/18 4:52 PM, H.J. Lu wrote:
> > I saw the same failures:
> >
> > FAIL: gcc.target/i386/pr49095.c scan-assembler-times \\), % 8
> > FAIL: gcc.target/i386/pr49095.c scan-assembler-times \\), % 8
> >
> > I think the new ones are better, especially in 32-bit case:
>
> Excellent!  Does the following test case patch make it so that
> it PASSes again?
>
> Peter
>
>
> Index: gcc/testsuite/gcc.target/i386/pr49095.c
> ===
> --- gcc/testsuite/gcc.target/i386/pr49095.c (revision 264793)
> +++ gcc/testsuite/gcc.target/i386/pr49095.c (working copy)
> @@ -73,4 +73,5 @@ G (long)
>  /* { dg-final { scan-assembler-not "test\[lq\]" } } */
>  /* The {f,h}{char,short,int,long}xor functions aren't optimized into
> a RMW instruction, so need load, modify and store.  FIXME eventually.  */
> -/* { dg-final { scan-assembler-times "\\), %" 8 } } */
> +/* { dg-final { scan-assembler-times "\\), %" 57 { target { ia32 } } } } */
> +/* { dg-final { scan-assembler-times "\\), %" 45 { target { lp64 } } } } */

  ^ This is wrong.

It should be not ia32.  Otherwise, it will skip x32.

-- 
H.J.

Re: [PATCH 0/2][IRA,LRA] Fix PR86939, IRA incorrectly creates an interference between a pseudo register and a hard register

2018-10-02 Thread Peter Bergner

On 10/2/18 7:17 PM, H.J. Lu wrote:
>> Index: gcc/testsuite/gcc.target/i386/pr49095.c
>> ===
>> --- gcc/testsuite/gcc.target/i386/pr49095.c (revision 264793)
>> +++ gcc/testsuite/gcc.target/i386/pr49095.c (working copy)
>> @@ -73,4 +73,5 @@ G (long)
>>  /* { dg-final { scan-assembler-not "test\[lq\]" } } */
>>  /* The {f,h}{char,short,int,long}xor functions aren't optimized into
>> a RMW instruction, so need load, modify and store.  FIXME eventually.  */
>> -/* { dg-final { scan-assembler-times "\\), %" 8 } } */
>> +/* { dg-final { scan-assembler-times "\\), %" 57 { target { ia32 } } } } */
>> +/* { dg-final { scan-assembler-times "\\), %" 45 { target { lp64 } } } } */
> 
>   ^ This is wrong.
> 
> It should be not ia32.  Otherwise, it will skip x32.

Ok, I changed it, thanks.

Peter


Index: gcc/testsuite/gcc.target/i386/pr49095.c
===
--- gcc/testsuite/gcc.target/i386/pr49095.c (revision 264793)
+++ gcc/testsuite/gcc.target/i386/pr49095.c (working copy)
@@ -73,4 +73,5 @@ G (long)
 /* { dg-final { scan-assembler-not "test\[lq\]" } } */
 /* The {f,h}{char,short,int,long}xor functions aren't optimized into
a RMW instruction, so need load, modify and store.  FIXME eventually.  */
-/* { dg-final { scan-assembler-times "\\), %" 8 } } */
+/* { dg-final { scan-assembler-times "\\), %" 57 { target { ia32 } } } } */
+/* { dg-final { scan-assembler-times "\\), %" 45 { target { ! ia32 } } } } */

[PATCH] RISC-V: Fix unordered float compare for Signaling NaN.

2018-10-02 Thread Kito Cheng

Hi Jim:

This patch is fixing the wrong behavior for unordered float compare
for Signaling NaN, current implementation will suppress the FP
exception flags unconditionally, however signaling NaN should signal
an exception according IEEE 754-2008 spec.

I've add an test for that, and full test is included in glibc/math,
glibc/math testsuite report is attached.

Tested with RV32GC and RV64GC linux gcc and glibc testsuite, no new
failed case introduced.


subdir-tests.sum
Description: Binary data
From 320ceae90c1d70e2254ee84ea30d89ed32bfb9da Mon Sep 17 00:00:00 2001
From: Monk Chiang 
Date: Fri, 14 Sep 2018 16:25:45 +0800
Subject: [PATCH] RISC-V: Fix unordered float compare for signaling NaN.

 - Old implementation will suppress the FP exception flags unconditionally,
   however signaling NaN should signal an exception according IEEE 754-2008
   spec.

ChangeLog:

2018-10-03  Monk Chiang 
	Kito Cheng 

gcc/
	* config/riscv/riscv.md (f_quiet4):
	Handle signaling NaN correctly.
	testsuite/gcc.target/riscv/fcompare_snan.c: New file.
---
 gcc/config/riscv/riscv.md | 41 -
 .../gcc.target/riscv/fcompare_snan.c  | 45 +++
 2 files changed, 84 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/fcompare_snan.c

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 4162dc578e8..8dd64a83c01 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -1965,10 +1965,47 @@
 	 QUIET_COMPARISON))
 (clobber (match_scratch:X 3 "=&r"))]
   "TARGET_HARD_FLOAT"
-  "frflags\t%3\n\tf.\t%0,%1,%2\n\tfsflags %3"
+{
+  /* This pattern is consist of 3 parts:
+ 1. Check source opreands has any NaN.
+feq tmp1, src1, src1
+feq tmp2, src2, src2
+ 2. Ignore comparison if any source operands is NaN.
+and dst, tmp1, tmp2
+beqz dst, 1f
+
+ 3. Do the comparison.
+f[lt|le] dst, src1, src2
+
+ 1. Check inputs has any NaN:
+
+ Using FEQ instructions to check the operands is NaN by compare those self.
+
+ 2. Ignore comparison if any NaN.
+
+ Jump to ahead of actual comparison instruction if one of source operand is
+ NaN value, because it's satisfied the semantics of the original
+ comparisons, destination register is set to 0 in step 1 if any source
+ operands are NaN (both qNaN and sNaN).
+
+ According RISC-V User Level ISA spec 8.8, "FEQ.S performs a
+ quiet comparison: only signaling NaN inputs cause an Invalid Operation
+ exception.", it set the right exception flags in step 1.
+
+ 3. Do the comparison.
+
+ In this point, both operands are not NaN, so just do the comparison.
+   */
+  return "feq.\t%0, %1, %1\n\t"
+	 "feq.\t%3, %2, %2\n\t"
+	 "and\t%0, %0, %3\n\t"
+	 "beqz\t%0, 1f\n\t"
+	 "f.\t%0,%1,%2\n\t"
+ "1:";
+}
   [(set_attr "type" "fcmp")
(set_attr "mode" "")
-   (set (attr "length") (const_int 12))])
+   (set (attr "length") (const_int 20))])
 
 (define_insn "*seq_zero_"
   [(set (match_operand:GPR   0 "register_operand" "=r")
diff --git a/gcc/testsuite/gcc.target/riscv/fcompare_snan.c b/gcc/testsuite/gcc.target/riscv/fcompare_snan.c
new file mode 100644
index 000..9dd059ddd08
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/fcompare_snan.c
@@ -0,0 +1,45 @@
+/* { dg-do run } */
+/* { dg-require-effective-target fenv_exceptions } */
+/* { dg-options "-O" } */
+
+#include 
+#include 
+#include 
+
+#include 
+
+volatile double minus_zero = -0.0;
+
+int test(double op0, double op1, int except) __attribute__((noinline));
+int test(double op0, double op1, int except)
+{
+  int ret;
+  feclearexcept (FE_ALL_EXCEPT);
+
+  ret = __builtin_isgreater (op0, op1);
+
+  if (fetestexcept(FE_ALL_EXCEPT) != except)
+abort ();
+
+  return ret;
+}
+
+int main()
+{
+  int ret, fflags = 0;
+  volatile double qnan = __builtin_nan("");
+  volatile double snan = __builtin_nans("");
+
+  test (minus_zero, snan, FE_INVALID);
+  test (snan, minus_zero, FE_INVALID);
+  test (minus_zero, qnan, 0);
+  test (qnan, minus_zero, 0);
+  test (1.0, snan, FE_INVALID);
+  test (snan, 1.0, FE_INVALID);
+  test (1.0, qnan, 0);
+  test (qnan, 1.0, 0);
+  test (qnan, snan, FE_INVALID);
+  test (snan, qnan, FE_INVALID);
+
+  return 0;
+}
-- 
2.17.1

Re: [PATCH] RISC-V: Fix unordered float compare for Signaling NaN.

2018-10-02 Thread Jim Wilson

On Tue, Oct 2, 2018 at 7:23 PM Kito Cheng  wrote:
> This patch is fixing the wrong behavior for unordered float compare
> for Signaling NaN, current implementation will suppress the FP
> exception flags unconditionally, however signaling NaN should signal
> an exception according IEEE 754-2008 spec.

This looks related to
https://github.com/riscv/riscv-gcc/pull/119

That change wasn't committed at the time because it reduces
performance for everyone, even though only a few people depend on the
exact semantics the patch provides.  One possible solution to that is
to add a -mieee option for people that want better IEEE conformance at
the expense of performance, and then enable the extra features with
-mieee.  A few other ports already have various -mieee* options.

IEEE 754 doesn't specify how C language operators work.  My
understanding is that the RISC-V hardware and gcc port is implementing
all required features of IEEE fp as per the ISO C standard, but that
the glibc testsuite requires that some optional features be
implemented and we are missing these optional features.  So there is a
tradeoff here, as to how to implement the optional features that few
people need without hurting performance for most people that don't
need them.

I can look at your patches tomorrow.

Jim

Re: [PATCH] RISC-V: Fix unordered float compare for Signaling NaN.

2018-10-02 Thread Kito Cheng

Hi Jim:

Oh, I missed that pull request in github, Andrew's patch is better
than our patch,
you can just ignore our one. However I think this fix is kind of
pre-requirement of rv32 glibc upstreaming, because Joseph want new port
has all passed testsuite result.
On Wed, Oct 3, 2018 at 10:46 AM Jim Wilson  wrote:
>
> On Tue, Oct 2, 2018 at 7:23 PM Kito Cheng  wrote:
> > This patch is fixing the wrong behavior for unordered float compare
> > for Signaling NaN, current implementation will suppress the FP
> > exception flags unconditionally, however signaling NaN should signal
> > an exception according IEEE 754-2008 spec.
>
> This looks related to
> https://github.com/riscv/riscv-gcc/pull/119
>
> That change wasn't committed at the time because it reduces
> performance for everyone, even though only a few people depend on the
> exact semantics the patch provides.  One possible solution to that is
> to add a -mieee option for people that want better IEEE conformance at
> the expense of performance, and then enable the extra features with
> -mieee.  A few other ports already have various -mieee* options.
>
> IEEE 754 doesn't specify how C language operators work.  My
> understanding is that the RISC-V hardware and gcc port is implementing
> all required features of IEEE fp as per the ISO C standard, but that
> the glibc testsuite requires that some optional features be
> implemented and we are missing these optional features.  So there is a
> tradeoff here, as to how to implement the optional features that few
> people need without hurting performance for most people that don't
> need them.
>
> I can look at your patches tomorrow.
>
> Jim

[PATCH] Use C++11 direct initialization in Debug associative containers

2018-10-02 Thread François Dumont


Just some code cleanup extending use of C++11 direct initialization.


    * include/debug/map.h
    (map<>::emplace<>(_Args&&...)): Use C++11 direct initialization.
    (map<>::emplace_hint<>(const_iterator, _Args&&...)): Likewise.
    (map<>::insert(value_type&&)): Likewise.
    (map<>::insert<>(_Pair&&)): Likewise.
    (map<>::insert<>(const_iterator, _Pair&&)): Likewise.
    (map<>::try_emplace): Likewise.
    (map<>::insert_or_assign): Likewise.
    (map<>::insert(node_type&&)): Likewise.
    (map<>::insert(const_iterator, node_type&&)): Likewise.
    (map<>::erase(const_iterator)): Likewise.
    (map<>::erase(const_iterator, const_iterator)): Likewise.
    * include/debug/multimap.h
    (multimap<>::emplace<>(_Args&&...)): Use C++11 direct initialization.
    (multimap<>::emplace_hint<>(const_iterator, _Args&&...)): Likewise.
    (multimap<>::insert<>(_Pair&&)): Likewise.
    (multimap<>::insert<>(const_iterator, _Pair&&)): Likewise.
    (multimap<>::insert(node_type&&)): Likewise.
    (multimap<>::insert(const_iterator, node_type&&)): Likewise.
    (multimap<>::erase(const_iterator)): Likewise.
    (multimap<>::erase(const_iterator, const_iterator)): Likewise.
    * include/debug/set.h
    (set<>::emplace<>(_Args&&...)): Use C++11 direct initialization.
    (set<>::emplace_hint<>(const_iterator, _Args&&...)): Likewise.
    (set<>::insert(value_type&&)): Likewise.
    (set<>::insert<>(const_iterator, value_type&&)): Likewise.
    (set<>::insert(const_iterator, node_type&&)): Likewise.
    (set<>::erase(const_iterator)): Likewise.
    (set<>::erase(const_iterator, const_iterator)): Likewise.
    * include/debug/multiset.h
    (multiset<>::emplace<>(_Args&&...)): Use C++11 direct initialization.
    (multiset<>::emplace_hint<>(const_iterator, _Args&&...)): Likewise.
    (multiset<>::insert<>(value_type&&)): Likewise.
    (multiset<>::insert<>(const_iterator, value_type&&)): Likewise.
    (multiset<>::insert(node_type&&)): Likewise.
    (multiset<>::insert(const_iterator, node_type&&)): Likewise.
    (multiset<>::erase(const_iterator)): Likewise.
    (multiset<>::erase(const_iterator, const_iterator)): Likewise.

Tested under Linux x86_64.

Already formerly accepted in another mailing thread so committed.

François

diff --git a/libstdc++-v3/include/debug/map.h b/libstdc++-v3/include/debug/map.h
index cbfd7c33d2f..6821fc561e4 100644
--- a/libstdc++-v3/include/debug/map.h
+++ b/libstdc++-v3/include/debug/map.h
@@ -240,8 +240,7 @@ namespace __debug
 	emplace(_Args&&... __args)
 	{
 	  auto __res = _Base::emplace(std::forward<_Args>(__args)...);
-	  return std::pair(iterator(__res.first, this),
-	   __res.second);
+	  return { { __res.first, this }, __res.second };
 	}
 
   template
@@ -249,9 +248,11 @@ namespace __debug
 	emplace_hint(const_iterator __pos, _Args&&... __args)
 	{
 	  __glibcxx_check_insert(__pos);
-	  return iterator(_Base::emplace_hint(__pos.base(),
-	  std::forward<_Args>(__args)...),
-			  this);
+	  return
+	{
+	  _Base::emplace_hint(__pos.base(), std::forward<_Args>(__args)...),
+	  this
+	};
 	}
 #endif
 
@@ -270,7 +271,7 @@ namespace __debug
   insert(value_type&& __x)
   {
 	auto __res = _Base::insert(std::move(__x));
-	return { iterator(__res.first, this), __res.second };
+	return { { __res.first, this }, __res.second };
   }
 
   template
 	insert(_Pair&& __x)
 	{
-	  std::pair<_Base_iterator, bool> __res
-	= _Base::insert(std::forward<_Pair>(__x));
-	  return std::pair(iterator(__res.first, this),
-	   __res.second);
+	  auto __res = _Base::insert(std::forward<_Pair>(__x));
+	  return { { __res.first, this }, __res.second };
 	}
 #endif
 
@@ -320,8 +319,11 @@ namespace __debug
 	insert(const_iterator __position, _Pair&& __x)
 	{
 	  __glibcxx_check_insert(__position);
-	  return iterator(_Base::insert(__position.base(),
-	std::forward<_Pair>(__x)), this);
+	  return
+	{
+	  _Base::insert(__position.base(), std::forward<_Pair>(__x)),
+	  this
+	};
 	}
 #endif
 
@@ -347,7 +349,7 @@ namespace __debug
 {
 	  auto __res = _Base::try_emplace(__k,
 	  std::forward<_Args>(__args)...);
-	  return { iterator(__res.first, this), __res.second };
+	  return { { __res.first, this }, __res.second };
 	}
 
   template 
@@ -356,7 +358,7 @@ namespace __debug
 {
 	  auto __res = _Base::try_emplace(std::move(__k),
 	  std::forward<_Args>(__args)...);
-	  return { iterator(__res.first, this), __res.second };
+	  return { { __res.first, this }, __res.second };
 	}
 
   template 
@@ -365,9 +367,12 @@ namespace __debug
 _Args&&... __args)
 {
 	  __glibcxx_check_insert(__hint);
-	  return iterator(_Base::try_emplace(__hint.base(), __k,
-	 std::forward<_Args>(__args)...),
-			  this);
+	  return
+	{
+	  _Base::try_emplace(__hint.base(), __k,
+ std::forward<_Args>(__args)...),
+	  this
+	};
 	}
 
   template 
@@ -375,9 +380,12 @@ namespace

RFC: Implementing a new API for value_range

2018-10-02 Thread Aldy Hernandez


Hi Richard.  Hi folks.

I'd like to implement a clean API that disallows direct access to any of 
the value_range internals.  My aim is a clean API with no change in 
functionality.


This is mostly a clean-up, but could also pave the way for possibly 
changing the underlying implementation in the future so we can unite VPR 
and the on-demand work with a single common code base.


I am quoting the main structure below to give an idea where I'd like to 
head, and am also attaching a proof-of-concept patch to tree-vrp.[hc]. 
It is untested and only builds cc1.


Ideally I'd like to evolve this to include other methods that make the 
VRP / vr-values code more readable.


Note: I have added a tree type field (m_type) to make it easy to 
determine the tree type of the range.  Right now a value_range looses 
the range type if UNDEFINED or VARYING, as both min/max are NULL.  If 
there is strong objection to the extra word, we could set min/max to 
integer_zero_node in the type if UNDEFINED/VARYING.  But really, all 
this will be hidden in the API, so we could change the underlying 
representation at will.


Would you be ok with this if I continue down this path?

Thanks.
Aldy

struct GTY((for_user)) value_range
{
  value_range ();
  value_range (tree type);
  value_range (tree type, value_range_type, tree, tree, bitmap = NULL);
  value_range (const value_range &);
  bool operator== (const value_range &) const;
  bool operator!= (const value_range &) const;
  void intersect (const value_range *);
  void union_ (const value_range *);

  /* Types of value ranges.  */
  bool undefined_p () const;
  bool varying_p () const;
  bool symbolic_p () const;
  bool numeric_p () const;
  void set_undefined ();
  void set_varying ();

  /* Equivalence bitmap methods.  */
  bitmap equiv () const;
  void set_equiv (bitmap);
  void equiv_free ();
  void equiv_copy (const value_range *);
  void equiv_clear ();
  void equiv_and (const value_range *);
  void equiv_ior (const value_range *);

  /* Misc methods.  */
  tree type () const;
  bool null_p () const;
  bool may_contain_p (tree) const;
  tree singleton () const;
  void canonicalize ();
  void copy_with_equiv_update (const value_range *);
  void dump () const;

  /* Temporary accessors that should eventually be removed.  */
  enum value_range_type vrtype () const;
  tree min () const;
  tree max () const;

  /* private: These are public because of GTY stupidity.  */
  enum value_range_type m_vrtype;
  tree m_min;
  tree m_max;
  tree m_type;
  /* Set of SSA names whose value ranges are equivalent to this one.
 This set is only valid when TYPE is VR_RANGE or VR_ANTI_RANGE.  */
  bitmap m_equiv;

 private:
  void init (tree type, value_range_type, tree, tree, bitmap);
  void check ();
};
diff --git a/gcc/tree-vrp.h b/gcc/tree-vrp.h
index 655cf055f0a..b2b9f971bae 100644
--- a/gcc/tree-vrp.h
+++ b/gcc/tree-vrp.h
@@ -29,31 +29,177 @@ enum value_range_type { VR_UNDEFINED, VR_RANGE,
has executed.  */
 struct GTY((for_user)) value_range
 {
-  /* Lattice value represented by this range.  */
-  enum value_range_type type;
+  value_range ();
+  value_range (tree type);
+  value_range (tree type, value_range_type, tree, tree, bitmap = NULL);
+  value_range (const value_range &);
+  bool operator== (const value_range &) const;
+  bool operator!= (const value_range &) const;
+  void intersect (const value_range *);
+  void union_ (const value_range *);
 
-  /* Minimum and maximum values represented by this range.  These
- values should be interpreted as follows:
+  /* Types of value ranges.  */
+  bool undefined_p () const;
+  bool varying_p () const;
+  bool symbolic_p () const;
+  bool numeric_p () const;
+  void set_undefined ();
+  void set_varying ();
 
-	- If TYPE is VR_UNDEFINED or VR_VARYING then MIN and MAX must
-	  be NULL.
+  /* Equivalence bitmap methods.  */
+  bitmap equiv () const;
+  void set_equiv (bitmap);
+  void equiv_free ();
+  void equiv_copy (const value_range *);
+  void equiv_clear ();
+  void equiv_and (const value_range *);
+  void equiv_ior (const value_range *);
 
-	- If TYPE == VR_RANGE then MIN holds the minimum value and
-	  MAX holds the maximum value of the range [MIN, MAX].
+  /* Misc methods.  */
+  tree type () const;
+  bool null_p () const;
+  bool may_contain_p (tree) const;
+  tree singleton () const;
+  void canonicalize ();
+  void copy_with_equiv_update (const value_range *);
+  void dump () const;
 
-	- If TYPE == ANTI_RANGE the variable is known to NOT
-	  take any values in the range [MIN, MAX].  */
-  tree min;
-  tree max;
+  /* Temporary accessors that should eventually be removed.  */
+  enum value_range_type vrtype () const;
+  tree min () const;
+  tree max () const;
 
+  /* private: These are public because of GTY stupidity.  */
+  enum value_range_type m_vrtype;
+  tree m_min;
+  tree m_max;
+  tree m_type;
   /* Set of SSA names whose value ranges are equivalent to this one.
  This set is only valid when TYPE is VR_R

91 matches

Mail list logo