date:20200909

On Tue, Sep 8, 2020 at 2:33 PM Nick Clifton via Gcc-patches
 wrote:
>
> Hi Cary,
>
>   If the lto plugin encounters a file with multiple symbol sections,
>   each of which also has a v1 symbol extension section[1] then it will
>   attempt to read the extension data for *every* symbol from each of the
>   extension sections.  This results in reading off the end of a buffer
>   with the associated memory corruption that that entails.
>
>   The attached patch fixes this problem by adding a field to the
>   plugin_symtab structure which is used to indicate the last symbol that
>   was updated.  Then in parse_symtab_extensions this index is used to
>   ensure that the correct symbols are updated, in the order read.
>
>   OK to apply ?

OK for all affected branches.

Thanks,
Richard.

> Cheers
>   Nick
>
> [1] See the attached file 'j' for an example of this kind of file:
>
> lto-plugin/ChangeLog
> 2020-09-08  Nick Clifton  
>
> * lto-plugin.c (struct plugin_symtab): Add last_sym field.
> (parse_symtab_extension): Only read as many entries as are
> available in the buffer.  Store the data read into the symbol
> table indexed from last_sym.  Increment last_sym.
>

Re: [PATCH] Practical Improvement to libgcc Complex Divide

On Tue, Sep 8, 2020 at 8:50 PM Patrick McGehearty via Gcc-patches
 wrote:
>
> (Version 4)
>
> (Added in version 4)
> Fixed Changelog entry to include __divsc3, __divdc3, __divxc3, __divtc3.
> Revised description to avoid incorrect use of "ulp (units last place)".
> Modified float precison case to use double precision when double
> precision hardware is available. Otherwise float uses the new algorithm.
> Added code to scale subnormal numerator arguments when appropriate.
> This change reduces 16 bit errors in double precision by a factor of 140.
> Revised results charts to match current version of code.
> Added background of tuning approach.
>
> Summary of Purpose
>
> The following patch to libgcc/libgcc2.c __divdc3 provides an
> opportunity to gain important improvements to the quality of answers
> for the default complex divide routine (half, float, double, extended,
> long double precisions) when dealing with very large or very small exponents.
>
> The current code correctly implements Smith's method (1962) [2]
> further modified by c99's requirements for dealing with NaN (not a
> number) results. When working with input values where the exponents
> are greater than *_MAX_EXP/2 or less than -(*_MAX_EXP)/2, results are
> substantially different from the answers provided by quad precision
> more than 1% of the time. This error rate may be unacceptable for many
> applications that cannot a priori restrict their computations to the
> safe range. The proposed method reduces the frequency of
> "substantially different" answers by more than 99% for double
> precision at a modest cost of performance.
>
> Differences between current gcc methods and the new method will be
> described. Then accuracy and performance differences will be discussed.
>
> Background
>
> This project started with an investigation related to
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59714.  Study of Beebe[1]
> provided an overview of past and recent practice for computing complex
> divide. The current glibc implementation is based on Robert Smith's
> algorithm [2] from 1962.  A google search found the paper by Baudin
> and Smith [3] (same Robert Smith) published in 2012. Elen Kalda's
> proposed patch [4] is based on that paper.
>
> I developed two sets of test set by randomly distributing values over
> a restricted range and the full range of input values. The current
> complex divide handled the restricted range well enough, but failed on
> the full range more than 1% of the time. Baudin and Smith's primary
> test for "ratio" equals zero reduced the cases with 16 or more error
> bits by a factor of 5, but still left too many flawed answers. Adding
> debug print out to cases with substantial errors allowed me to see the
> intermediate calculations for test values that failed. I noted that
> for many of the failures, "ratio" was a subnormal. Changing the
> "ratio" test from check for zero to check for subnormal reduced the 16
> bit error rate by another factor of 12. This single modified test
> provides the greatest benefit for the least cost, but the percentage
> of cases with greater than 16 bit errors (double precision data) is
> still greater than 0.027% (2.7 in 10,000).
>
> Continued examination of remaining errors and their intermediate
> computations led to the various tests of input value tests and scaling
> to avoid under/overflow. The current patch does not handle some of the
> rarest and most extreme combinations of input values, but the random
> test data is only showing 1 case in 10 million that has an error of
> greater than 12 bits. That case has 18 bits of error and is due to
> subtraction cancellation. These results are significantly better
> than the results reported by Baudin and Smith.
>
> Support for half, float, double, extended, and long double precision
> is included as all are handled with suitable preprocessor symbols in a
> single source routine. Since half precision is computed with float
> precision as per current libgcc practice, the enhanced algorithm
> provides no benefit for half precision and would cost performance.
> Therefore half precision is left unchanged.
>
> The existing constants for each precision:
> float: FLT_MAX, FLT_MIN;
> double: DBL_MAX, DBL_MIN;
> extended and/or long double: LDBL_MAX, LDBL_MIN
> are used for avoiding the more common overflow/underflow cases.
>
> Testing for when both parts of the denominator had exponents roughly
> small enough to allow shifting any subnormal values to normal values,
> all input values could be scaled up without risking unnecessary
> overflow and gaining a clear improvement in accuracy. Similarly, when
> either numerator was subnormal and the other numerator and both
> denominator values were not too large, scaling could be used to reduce
> risk of computing with subnormals.  The test and scaling values used
> all fit within the allowed exponent range for each precision required
> by the C standard.
>
> Float precision has even more difficulty with ge

Re: [PATCH] Cygwin/MinGW: Do not version lto plugins

On Wed, Sep 9, 2020 at 2:28 AM JonY via Gcc-patches
 wrote:
>
> Hello,
>
> The lto plugis are tied to the built GCC anyway, so there isn't much
> point to versioning them.

In fact the lto plugins are not tied to the built GCCs very much, instead
we try to ensure compatibility so that a single plugin can be used with
multiple GCC versions.

> * gcc/config.host: Remove version string
> * lto-plugin/Makefile.am: Use libtool -avoid-version
> * lto-plugin/Makefile.in: Regenerate
>
> This patch has been in use with Cygwin gcc for a long time and should be
> pushed upstream. Patch OK?

The libtool docs are not very specific here but does this affect the
result for Linux ELF platforms at all?

Richard.

>

Re: [PATCH v2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

On Wed, Sep 9, 2020 at 3:47 AM luoxhu  wrote:
>
>
>
> On 2020/9/8 16:26, Richard Biener wrote:
> >> Seems not only pseudo, for example "v = vec_insert (i, v, n);"
> >> the vector variable will be store to stack first, then [r112:DI] is a
> >> memory here to be processed.  So the patch loads it from stack(insn #10) to
> >> temp vector register first, and store to stack again(insn #24) after
> >> rs6000_vector_set_var.
> > Hmm, yeah - I guess that's what should be addressed first then.
> > I'm quite sure that in case 'v' is not on the stack but in memory like
> > in my case a SImode store is better than what we get from
> > vec_insert - in fact vec_insert will likely introduce a RMW cycle
> > which is prone to inserting store-data-races?
>
> Yes, for your case, there is no stack operation and to_rtx is expanded
> with BLKmode instead of V4SImode.  Add the to_rtx mode check could workaround
> it.  ASM doesn't show store hit load issue.
>
> optimized:
>
> _1 = i_2(D) % 4;
> VIEW_CONVERT_EXPR(x.u)[_1] = a_4(D);
>
> expand:
> 2: r118:DI=%3:DI
> 3: r119:DI=%4:DI
> 4: NOTE_INSN_FUNCTION_BEG
> 7: r120:DI=unspec[`*.LANCHOR0',%2:DI] 47
>   REG_EQUAL `*.LANCHOR0'
> 8: r122:SI=r118:DI#0
> 9: {r124:SI=r122:SI/0x4;clobber ca:SI;}
>10: r125:SI=r124:SI<<0x2
>11: r123:SI=r122:SI-r125:SI
>   REG_EQUAL r122:SI%0x4
>12: r126:DI=sign_extend(r123:SI)
>13: r127:DI=r126:DI+0x4
>14: r128:DI=r127:DI<<0x2
>15: r129:DI=r120:DI+r128:DI
>16: [r129:DI]=r119:DI#0
>
>  p to_rtx
> $319 = (rtx_def *) (mem/c:BLK (reg/f:DI 120) [2 x+0 S32 A128])
>
> asm:
> addis 2,12,.TOC.-.LCF0@ha
> addi 2,2,.TOC.-.LCF0@l
> .localentry test,.-test
> srawi 9,3,2
> addze 9,9
> addis 10,2,.LANCHOR0@toc@ha
> addi 10,10,.LANCHOR0@toc@l
> slwi 9,9,2
> subf 9,9,3
> extsw 9,9
> addi 9,9,4
> sldi 9,9,2
> stwx 4,10,9
> blr
>
>
> >
> > So - what we need to "fix" is cfgexpand.c marking variably-indexed
> > decls as not to be expanded as registers (see
> > discover_nonconstant_array_refs).
> >
> > I guess one way forward would be to perform instruction
> > selection on GIMPLE here and transform
> >
> > VIEW_CONVERT_EXPR(D.3185)[_1] = i_6(D)
> >
> > to a (direct) internal function based on the vec_set optab.
>
> I don't quite understand what you mean here.  Do you mean:
> ALTIVEC_BUILTIN_VEC_INSERT -> VIEW_CONVERT_EXPR -> internal function -> 
> vec_set

You're writing VIEW_CONVERT_EXPR here but the outermost component
is an ARRAY_REF.  But yes, this is what I meant.

> or ALTIVEC_BUILTIN_VEC_INSERT -> internal function -> vec_set?
> And which pass to put the selection and transform is acceptable?

Close to RTL expansion.  There's gimple-isel.cc which does instruction selection
for VEC_COND_EXPRs.

> Why call it *based on* vec_set optab?  The VIEW_CONVERT_EXPR or internal 
> function
> is expanded to vec_set optab.

Based on because we have the convenient capability to represent optabs to be
used for RTL expansion as internal function calls on GIMPLE, called
"direct internal function".

> I guess you suggest adding internal function for VIEW_CONVERT_EXPR in gimple,
> and do the transform from internal function to vec_set optab in expander?

No, I suggest to "add" an internal function for the vec_set optab, see
DEF_INTERNAL_OPTAB_FN in internal-fn.def

> I doubt my understanding as this looks really over complicated since we
> transform from VIEW_CONVERT_EXPR to vec_set optab directly so far...
> IIUC, Internal function seems doesn't help much here as Segher said before.

The advantage would be to circumvent GIMPLEs forcing of memory here.
But as I said here:

> > But then in GIMPLE D.3185 is also still memory (we don't have a variable
> > index partial register set operation - BIT_INSERT_EXPR is
> > currently specified to receive a constant bit position only).

it might not work out so easy.  Going down the rathole to avoid forcing
memory during RTL expansion for select cases (vector type bases
with a supported vector mode) might be something to try.

That at least would make the approach of dealing with this
in expand_assignment or siblings sensible.

> > At which point after your patch is the stack storage elided?
> >
>
> Stack storage is elided by register reload pass in RTL.
>
>
> Thanks,
> Xionghu

Re: [PATCH] Cygwin/MinGW: Do not version lto plugins

2020-09-09 Thread JonY via Gcc-patches

On 9/9/20 7:21 AM, Richard Biener wrote:
> On Wed, Sep 9, 2020 at 2:28 AM JonY via Gcc-patches
>  wrote:
>>
>> Hello,
>>
>> The lto plugis are tied to the built GCC anyway, so there isn't much
>> point to versioning them.
> 
> In fact the lto plugins are not tied to the built GCCs very much, instead
> we try to ensure compatibility so that a single plugin can be used with
> multiple GCC versions.
> 

I see, I was not aware of this.

>> * gcc/config.host: Remove version string
>> * lto-plugin/Makefile.am: Use libtool -avoid-version
>> * lto-plugin/Makefile.in: Regenerate
>>
>> This patch has been in use with Cygwin gcc for a long time and should be
>> pushed upstream. Patch OK?
> 
> The libtool docs are not very specific here but does this affect the
> result for Linux ELF platforms at all?

With current builds:

/usr/libexec/gcc/x86_64-pc-linux-gnu/9.2.0/liblto_plugin.la
/usr/libexec/gcc/x86_64-pc-linux-gnu/9.2.0/liblto_plugin.so
/usr/libexec/gcc/x86_64-pc-linux-gnu/9.2.0/liblto_plugin.so.0
/usr/libexec/gcc/x86_64-pc-linux-gnu/9.2.0/liblto_plugin.so.0.0.0

With -avoid-version it will only generate liblto_plugin.so, from
gcc/config.host, it is already what is loaded.

I will do a native linux build to confirm this.



signature.asc
Description: OpenPGP digital signature

Re: [PATCH] Makefile.tpl: Add check-g++

On Tue, Sep 8, 2020 at 11:35 AM Hu Jiangping  wrote:
>
> This patch add a new check-g++ target to the Makefile under toplevel,
> as synonym of the check-c++ target.
>
> It is to be consistent with the check-g++ target under the gcc
> subdirectory.  And because check-gcc can be performed under toplevel,
> it is very possible that check-g++ may be performed under toplevel,
> but now it gives 'No rule to make target.' error.

I don't think this is correct.  The toplevel check-gcc is not a simple
alias for gcc/ check-gcc, instead it is for the whole gcc/ subdirectory
checks.

Richard.

>
> ChangeLog:
> 2020-09-08 Hu Jiangping 
>
> Makefile.tpl (check-g++): New target. As synonym of check-c++.
> Makefile.in: Regenerated.
>
> Bootstraped on aarch64. Ok for master?
>
> Regards!
> Hujp
>
> ---
>  Makefile.in  | 3 +++
>  Makefile.tpl | 3 +++
>  2 files changed, 6 insertions(+)
>
> diff --git a/Makefile.in b/Makefile.in
> index 36e369df6e7..35b57d5af21 100644
> --- a/Makefile.in
> +++ b/Makefile.in
> @@ -4,6 +4,9 @@ check-gcc-d:
>  check-d: check-gcc-d check-target-libphobos
>
>
> +.PHONY: check-g++
> +check-g++: check-c++
> +
>  # The gcc part of install-no-fixedincludes, which relies on an intimate
>  # knowledge of how a number of gcc internal targets (inter)operate.  
> Delegate.
>  .PHONY: gcc-install-no-fixedincludes
> diff --git a/Makefile.tpl b/Makefile.tpl
> index efed1511750..6dfe3c9caca 100644
> --- a/Makefile.tpl
> +++ b/Makefile.tpl
> @@ -1542,6 +1542,9 @@ check-gcc-[+language+]:
>  check-[+language+]: check-gcc-[+language+][+ FOR lib-check-target +] [+ 
> lib-check-target +][+ ENDFOR lib-check-target +]
>  [+ ENDFOR languages +]
>
> +.PHONY: check-g++
> +check-g++: check-c++
> +
>  # The gcc part of install-no-fixedincludes, which relies on an intimate
>  # knowledge of how a number of gcc internal targets (inter)operate.  
> Delegate.
>  .PHONY: gcc-install-no-fixedincludes
> --
> 2.17.1
>
>
>

[PATCH] tree-optimization/96978 - fix fallout of BB vectorization of live stmts

This avoids looking at STMT_VINFO_LIVE_P when vectorizing BBs.

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

Richard.

2020-09-09  Richard Biener  

PR tree-optimization/96978
* tree-vect-stmts.c (vectorizable_condition): Do not
look at STMT_VINFO_LIVE_P for BB vectorization.
(vectorizable_comparison): Likewise.
---
 gcc/tree-vect-stmts.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index a7ffe72378f..065d1bf3caf 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -9855,7 +9855,7 @@ vectorizable_condition (vec_info *vinfo,
return false;
 
   /* FORNOW: only supported as part of a reduction.  */
-  if (STMT_VINFO_LIVE_P (stmt_info))
+  if (loop_vinfo && STMT_VINFO_LIVE_P (stmt_info))
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -10328,7 +10328,7 @@ vectorizable_comparison (vec_info *vinfo,
   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
 return false;
 
-  if (STMT_VINFO_LIVE_P (stmt_info))
+  if (loop_vinfo && STMT_VINFO_LIVE_P (stmt_info))
 {
   if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-- 
2.26.2

Re: [PATCH V2] libgccjit: Add new gcc_jit_context_new_blob entry point

2020-09-09 Thread Andrea Corallo

Andrea Corallo  writes:

[...]

> Sure it is, thanks for reviewing.
>
> Attached the updated version of the patch.
>
> make check-jit is clean plus I tested the new entry point with the
> modified Emacs.
>
> Thanks
>
>   Andrea

Ping

Re: [PATCH] libgccjit: Improve doc and comments regarding type casts

2020-09-09 Thread Andrea Corallo

Andrea Corallo  writes:

> Andrea Corallo  writes:
>
>> Hi Alex,
>>
>> Looking at the code I believe all these casts are meant to be supported
>> (read your intuition was correct).
>>
>> Also IMO source of confusion is that the doc is mentioning 'int' and
>> 'float' but I believe would be better to have like 'integral' and
>> 'floating-point' to clearly disambiguates with respect to the C
>> types.
>>
>> AFAIU the set of supported casts should be like:
>>
>>  integral   <-> integral
>>  floating-point <-> floating-point
>>  integral   <-> floating-point
>>  integral   <-> bool
>>  P* <-> Q*   for pointer types P and Q.
>>
>> I'd propose to install the following patch to make doc and comments
>> homogeneous at documenting what do we accept, and I guess we should just
>> consider bugs if some of these conversions is not handled correctly or
>> leads to ICE.
>>
>> Bests
>>
>>   Andrea
>>
>> gcc/jit/ChangeLog
>>
>> 2020-07-21  Andrea Corallo  
>>
>>  * docs/_build/texinfo/libgccjit.texi (Type-coercion): Improve doc
>>  on allowed type casting.
>>  * docs/topics/expressions.rst (gccjit::context::new_cast)
>>  (gcc_jit_context_new_cast): Likewise.
>>  * libgccjit.c: Improve comment on allowed type casting.
>>  * libgccjit.h: Likewise
>>
>> From 914b9e86808c947d4bb2b06c6960fd8031125f67 Mon Sep 17 00:00:00 2001
>> From: Andrea Corallo 
>> Date: Tue, 21 Jul 2020 20:12:23 +0200
>> Subject: [PATCH] libgccjit: improve documentation on type conversions

[...]

> Ping
>
> Thanks
>   Andrea

Ping

Re: [PATCH] Implement __builtin_thread_pointer for x86 TLS

2020-09-09 Thread Hongtao Liu via Gcc-patches

On Wed, Sep 9, 2020 at 2:35 PM Jakub Jelinek  wrote:
>
> On Wed, Sep 09, 2020 at 10:30:46AM +0800, Hongtao Liu wrote:
> > From 400418fadce46e7db7bd37be45ef5ff5beb08d19 Mon Sep 17 00:00:00 2001
> > From: liuhongt 
> > Date: Tue, 8 Sep 2020 15:44:58 +0800
> > Subject: [PATCH] Implement __builtin_thread_pointer for x86 TLS.
> >
> > gcc/ChangeLog:
> >   PR target/96955
> >   * config/i386/i386.md (get_thread_pointer): New
> >   expander.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/i386/builtin_thread_pointer.c: New test.
> > ---
> >  gcc/config/i386/i386.md   | 10 +++
> >  .../gcc.target/i386/builtin_thread_pointer.c  | 28 +++
> >  2 files changed, 38 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/builtin_thread_pointer.c
> >
> > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > index 446793b78db..2f6eb0a7b98 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -15433,6 +15433,16 @@ (define_insn_and_split "*tls_local_dynamic_32_once"
> >(clobber (reg:CC FLAGS_REG))])])
> >
> >  ;; Load and add the thread base pointer from %:0.
> > +(define_expand "get_thread_pointer"
> > +  [(set (match_operand:PTR 0 "register_operand")
> > + (unspec:PTR [(const_int 0)] UNSPEC_TP))]
> > +  ""
> > +{
> > +  /* targetm is not existed in the scope of condition.  */
>
> Reword as "targetm is not visible in the scope of the condition."
> In fact, even if it was, it wouldn't help, because 
> expand_builtin_thread_pointer
> assumes that if the expander exists, then it will work and emit some code
> and emits the error only if the expander doesn't exist.
>
> Ok for trunk with that change, thanks.
>

Thanks for the review.

> > +  if (!targetm.have_tls)
> > +error ("%<__builtin_thread_pointer%> is not supported on this target");
> > +})
> > +
>
> Jakub
>


-- 
BR,
Hongtao

RE: [PATCH PR96357][GCC][AArch64]: could not split insn UNSPEC_COND_FSUB with AArch64 SVE

2020-09-09 Thread Przemyslaw Wirkus

Hello maintainers,

Can I backport this patch to GCC 10 please ?

Regards
Przemyslaw

> Committed with:
> 
> commit b648814c02eb418aaf27897c480452172ee96303
> Date:   Fri Aug 28 11:31:04 2020 +0100
> 
> Kind regards,
> Przemyslaw

Re: [PATCH PR96357][GCC][AArch64]: could not split insn UNSPEC_COND_FSUB with AArch64 SVE

2020-09-09 Thread Richard Sandiford

Przemyslaw Wirkus  writes:
> Hello maintainers,
>
> Can I backport this patch to GCC 10 please ?

Sure, that's fine.

Thanks,
Richard

>
> Regards
> Przemyslaw
>
>> Committed with:
>> 
>> commit b648814c02eb418aaf27897c480452172ee96303
>> Date:   Fri Aug 28 11:31:04 2020 +0100
>> 
>> Kind regards,
>> Przemyslaw

RE: [PATCH PR96357][GCC][AArch64]: could not split insn UNSPEC_COND_FSUB with AArch64 SVE

2020-09-09 Thread Przemyslaw Wirkus

> Przemyslaw Wirkus  writes:
> > Hello maintainers,
> >
> > Can I backport this patch to GCC 10 please ?
> 
> Sure, that's fine.

commit 41d22ec51c4190133a082197e7ff67b4741fc09b
Date:   Fri Aug 28 11:31:04 2020 +0100

> Thanks,
> Richard
> 
> >
> > Regards
> > Przemyslaw
> >
> >> Committed with:
> >>
> >> commit b648814c02eb418aaf27897c480452172ee96303
> >> Date:   Fri Aug 28 11:31:04 2020 +0100
> >>
> >> Kind regards,
> >> Przemyslaw

RE: [PATCH] Makefile.tpl: Add check-g++

2020-09-09 Thread Hu, Jiangping

Hi, Richard

> On Tue, Sep 8, 2020 at 11:35 AM Hu Jiangping 
> wrote:
> >
> > This patch add a new check-g++ target to the Makefile under toplevel,
> > as synonym of the check-c++ target.
> >
> > It is to be consistent with the check-g++ target under the gcc
> > subdirectory.  And because check-gcc can be performed under toplevel,
> > it is very possible that check-g++ may be performed under toplevel,
> > but now it gives 'No rule to make target.' error.
> 
> I don't think this is correct.  The toplevel check-gcc is not a simple
> alias for gcc/ check-gcc, instead it is for the whole gcc/ subdirectory
> checks.
Thanks for reply.

Yes, I know what toplevel check-gcc do, and the difference with gcc / check-gcc.
IIUC you mean if we add toplevel check-g++, it will make the toplevel check-gcc
and check-g++ a little strange, because toplevel check-gcc is not only for c,
while toplevel check-g++ is only for c++.

I agree with you on this point. And I thought about that before.
I just thought that toplevel check-g++ is easy to be executed incorrectly, 
so can we make it do some right things as misunderstood. As for whether the 
meanings of check-gcc and check-g++ are inconsistent, it may not be that 
important, or we could modify the document https://gcc.gnu.org/install/test.html
to describe what they actually do.

Thanks again. I agree with you more now.

Regards!
Hujp

> 
> Richard.
> 
> >
> > ChangeLog:
> > 2020-09-08 Hu Jiangping 
> >
> > Makefile.tpl (check-g++): New target. As synonym of check-c++.
> > Makefile.in: Regenerated.
> >
> > Bootstraped on aarch64. Ok for master?
> >
> > Regards!
> > Hujp
> >
> > ---
> >  Makefile.in  | 3 +++
> >  Makefile.tpl | 3 +++
> >  2 files changed, 6 insertions(+)
> >
> > diff --git a/Makefile.in b/Makefile.in
> > index 36e369df6e7..35b57d5af21 100644
> > --- a/Makefile.in
> > +++ b/Makefile.in
> > @@ -4,6 +4,9 @@ check-gcc-d:
> >  check-d: check-gcc-d check-target-libphobos
> >
> >
> > +.PHONY: check-g++
> > +check-g++: check-c++
> > +
> >  # The gcc part of install-no-fixedincludes, which relies on an intimate
> >  # knowledge of how a number of gcc internal targets (inter)operate.
> Delegate.
> >  .PHONY: gcc-install-no-fixedincludes
> > diff --git a/Makefile.tpl b/Makefile.tpl
> > index efed1511750..6dfe3c9caca 100644
> > --- a/Makefile.tpl
> > +++ b/Makefile.tpl
> > @@ -1542,6 +1542,9 @@ check-gcc-[+language+]:
> >  check-[+language+]: check-gcc-[+language+][+ FOR lib-check-target +] [+
> lib-check-target +][+ ENDFOR lib-check-target +]
> >  [+ ENDFOR languages +]
> >
> > +.PHONY: check-g++
> > +check-g++: check-c++
> > +
> >  # The gcc part of install-no-fixedincludes, which relies on an intimate
> >  # knowledge of how a number of gcc internal targets (inter)operate.
> Delegate.
> >  .PHONY: gcc-install-no-fixedincludes
> > --
> > 2.17.1
> >
> >
> >
>

Re: [PATCH] Makefile.tpl: Add check-g++

On Wed, Sep 9, 2020 at 10:43 AM Hu, Jiangping
 wrote:
>
> Hi, Richard
>
> > On Tue, Sep 8, 2020 at 11:35 AM Hu Jiangping 
> > wrote:
> > >
> > > This patch add a new check-g++ target to the Makefile under toplevel,
> > > as synonym of the check-c++ target.
> > >
> > > It is to be consistent with the check-g++ target under the gcc
> > > subdirectory.  And because check-gcc can be performed under toplevel,
> > > it is very possible that check-g++ may be performed under toplevel,
> > > but now it gives 'No rule to make target.' error.
> >
> > I don't think this is correct.  The toplevel check-gcc is not a simple
> > alias for gcc/ check-gcc, instead it is for the whole gcc/ subdirectory
> > checks.
> Thanks for reply.
>
> Yes, I know what toplevel check-gcc do, and the difference with gcc / 
> check-gcc.
> IIUC you mean if we add toplevel check-g++, it will make the toplevel 
> check-gcc
> and check-g++ a little strange, because toplevel check-gcc is not only for c,
> while toplevel check-g++ is only for c++.
>
> I agree with you on this point. And I thought about that before.
> I just thought that toplevel check-g++ is easy to be executed incorrectly,
> so can we make it do some right things as misunderstood. As for whether the
> meanings of check-gcc and check-g++ are inconsistent, it may not be that
> important, or we could modify the document 
> https://gcc.gnu.org/install/test.html
> to describe what they actually do.

To word it differently:
The toplevel make check-gcc is to test the gcc module, since there is no c++
module a toplevel check-c++ does not make sense.

Richard.

> Thanks again. I agree with you more now.
>
> Regards!
> Hujp
>
> >
> > Richard.
> >
> > >
> > > ChangeLog:
> > > 2020-09-08 Hu Jiangping 
> > >
> > > Makefile.tpl (check-g++): New target. As synonym of check-c++.
> > > Makefile.in: Regenerated.
> > >
> > > Bootstraped on aarch64. Ok for master?
> > >
> > > Regards!
> > > Hujp
> > >
> > > ---
> > >  Makefile.in  | 3 +++
> > >  Makefile.tpl | 3 +++
> > >  2 files changed, 6 insertions(+)
> > >
> > > diff --git a/Makefile.in b/Makefile.in
> > > index 36e369df6e7..35b57d5af21 100644
> > > --- a/Makefile.in
> > > +++ b/Makefile.in
> > > @@ -4,6 +4,9 @@ check-gcc-d:
> > >  check-d: check-gcc-d check-target-libphobos
> > >
> > >
> > > +.PHONY: check-g++
> > > +check-g++: check-c++
> > > +
> > >  # The gcc part of install-no-fixedincludes, which relies on an intimate
> > >  # knowledge of how a number of gcc internal targets (inter)operate.
> > Delegate.
> > >  .PHONY: gcc-install-no-fixedincludes
> > > diff --git a/Makefile.tpl b/Makefile.tpl
> > > index efed1511750..6dfe3c9caca 100644
> > > --- a/Makefile.tpl
> > > +++ b/Makefile.tpl
> > > @@ -1542,6 +1542,9 @@ check-gcc-[+language+]:
> > >  check-[+language+]: check-gcc-[+language+][+ FOR lib-check-target +] [+
> > lib-check-target +][+ ENDFOR lib-check-target +]
> > >  [+ ENDFOR languages +]
> > >
> > > +.PHONY: check-g++
> > > +check-g++: check-c++
> > > +
> > >  # The gcc part of install-no-fixedincludes, which relies on an intimate
> > >  # knowledge of how a number of gcc internal targets (inter)operate.
> > Delegate.
> > >  .PHONY: gcc-install-no-fixedincludes
> > > --
> > > 2.17.1
> > >
> > >
> > >
> >
>
>
>

Re: [PATCH, rs6000] Add non-relative jump table support on Power Linux

2020-09-09 Thread HAO CHEN GUI via Gcc-patches


Hi Segher,

    Thanks for your advice. I removed macros defined in linux64.h and 
linux.h. So they take relative jump tables by default. When 
no-relative-jumptables is set, the absolute jump tables are taken. All 
things relevant to section relocations are put in another patch. Thanks 
again.



On 8/9/2020 上午 5:46, Segher Boessenkool wrote:

On Mon, Aug 24, 2020 at 03:48:43PM +0800, HAO CHEN GUI wrote:

I'll try to be quicker at reviewing iterations of this -- there is quite
some way to go, without me slowing things down!

Sigh :-(


* config/rs6000/linux.h (rs6000_relative_jumptables): Define.

That macro looks like it is variable (or function).  *Make* it a
variable, please?


* config/rs6000/rs6000.c (TARGET_ASM_GENERATE_PIC_ADDR_DIFF_VEC):
Define

Period?


(rs6000_gen_pic_addr_diff_vec, rs6000_output_addr_vec_elt): Implement.

"New function."


* config/rs6000/rs6000.md (absolute_tablejumpsi,
absolute_tablejumpsi_nospec, absolute_tablejumpdi,
absolute_tablejumpdi_nospec): Add four new expansions.

"New define_expands." or "New expanders."


* config/rs6000/rs6000.opt (mrelative-jumptables): Add a new option and
set rs6000_relative_jumptables to true by default.

"rs6000.opt: Add -mrelative-jumptables."


+/* Disable relative jump tables for Power Linux.  */
+#undef rs6000_relative_jumptables
+#define rs6000_relative_jumptables 0

Why?


+/* Disable relative jump tables for Power Linux64.  */
+#undef rs6000_relative_jumptables
+#define rs6000_relative_jumptables 0

(That's not what it's called...  Just don't say the "for..." at all?
It is clear from what file it is in.)


  /* Indicate that jump tables go in the text section.  */
  #undef  JUMP_TABLES_IN_TEXT_SECTION
-#define JUMP_TABLES_IN_TEXT_SECTION TARGET_64BIT
+#define JUMP_TABLES_IN_TEXT_SECTION rs6000_relative_jumptables

Not sure that is correct.  Maybe the patch using rodata (.data.rel.ro)
should be a separate patch?


  /* Define as C expression which evaluates to nonzero if the tablejump
 instruction expects the table to contain offsets from the address of the
 table.
 Do not define this if the table should contain absolute addresses.  */
-#define CASE_VECTOR_PC_RELATIVE 1
+#define CASE_VECTOR_PC_RELATIVE 0

This should depend on the new flag?


+/* Specify the machine mode that this machine uses
+   for the index in the tablejump instruction.  */
+#define CASE_VECTOR_MODE \
+  (TARGET_32BIT || rs6000_relative_jumptables ? SImode : DImode)

rs6000_relative_jumptables ? SImode : Pmode;


+  if (rs6000_relative_jumptables)
+   {
+ if (TARGET_32BIT)
+   emit_jump_insn (gen_tablejumpsi (operands[0], operands[1]));
+ else
+   emit_jump_insn (gen_tablejumpdi (operands[0], operands[1]));
+   }

Hrm, I guess we should make that a parameterized name (future work,
don't do it now :-) )


+(define_expand "absolute_tablejumpsi"

Don't prefix names; it should start with "tablejump".


Segher
* config/rs6000/rs6000-protos.h (rs6000_output_addr_vec_elt): Declare.
* config/rs6000/rs6000.c (TARGET_ASM_GENERATE_PIC_ADDR_DIFF_VEC):
Define.
(rs6000_gen_pic_addr_diff_vec, rs6000_output_addr_vec_elt): Implement.
* config/rs6000/rs6000.h (CASE_VECTOR_PC_RELATIVE,
CASE_VECTOR_MODE, ASM_OUTPUT_ADDR_VEC_ELT): Define.
* config/rs6000/rs6000.md (tablejumpsi_absolute,
tablejumpsi_nospec_absolute, tablejumpdi_absolute,
tablejumpdi_nospec_absolute): New expanders.
* config/rs6000/rs6000.opt (mrelative-jumptables): Add
mrelative-jumptables.
diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index 5508484ba19..62564dd67f2 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -155,6 +155,8 @@ extern void rs6000_split_logical (rtx [], enum rtx_code, 
bool, bool, bool);
 extern bool rs6000_pcrel_p (struct function *);
 extern bool rs6000_fndecl_pcrel_p (const_tree);
 
+extern void rs6000_output_addr_vec_elt (FILE *, int);
+
 /* Different PowerPC instruction formats that are used by GCC.  There are
various other instruction formats used by the PowerPC hardware, but these
formats are not currently used by GCC.  */
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 58f5d780603..94d1e650b94 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1369,6 +1369,9 @@ static const struct attribute_spec 
rs6000_attribute_table[] =
 #undef TARGET_ASM_OUTPUT_ADDR_CONST_EXTRA
 #define TARGET_ASM_OUTPUT_ADDR_CONST_EXTRA rs6000_output_addr_const_extra
 
+#undef  TARGET_ASM_GENERATE_PIC_ADDR_DIFF_VEC
+#define TARGET_ASM_GENERATE_PIC_ADDR_DIFF_VEC rs6000_gen_pic_addr_diff_vec
+
 #undef TARGET_LEGITIMIZE_ADDRESS
 #define TARGET_LEGITIMIZE_ADDRESS rs6000_legitimize_address
 
@@ -26494,6 +26497,27 @@ rs6000_cannot_substitute_mem_equiv_p (rtx mem)
   return fal

[PATCH] enable live condition vectorization

This removes a check preventing vectorization of live results of
vectorized conditions.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2020-09-09  Richard Biener  

* tree-vect-stmts.c (vectorizable_condition): Allow
STMT_VINFO_LIVE_P stmts.

* gcc.dg/vect/vect-cond-13.c: New testcase.
* gcc.target/i386/pr87007-4.c: Adjust.
* gcc.target/i386/pr87007-5.c: Likewise.
---
 gcc/testsuite/gcc.dg/vect/vect-cond-13.c  | 38 +++
 gcc/testsuite/gcc.target/i386/pr87007-4.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr87007-5.c |  2 +-
 gcc/tree-vect-stmts.c |  9 --
 4 files changed, 40 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-cond-13.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-13.c 
b/gcc/testsuite/gcc.dg/vect/vect-cond-13.c
new file mode 100644
index 000..2dfb8797cd8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-13.c
@@ -0,0 +1,38 @@
+/* { dg-do run } */
+
+#include "tree-vect.h"
+
+int a[1024];
+int b[1024];
+
+int
+foo ()
+{
+  int tem;
+  for (int i = 0; i < 1024; ++i)
+{
+  if (a[i] < 0)
+tem = -a[i] - 1;
+  else
+tem = a[i];
+  b[i] = tem + 10;
+}
+  return tem;
+}
+
+int main()
+{
+  check_vect ();
+
+  for (int i = 0; i < 1024; ++i)
+{
+  a[i] = i - 333;
+  __asm__ volatile ("" ::: "memory");
+}
+  int res = foo ();
+  if (res != 1023 - 333)
+abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target 
vect_condition } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr87007-4.c 
b/gcc/testsuite/gcc.target/i386/pr87007-4.c
index e91bdcbac44..9c4b8005af3 100644
--- a/gcc/testsuite/gcc.target/i386/pr87007-4.c
+++ b/gcc/testsuite/gcc.target/i386/pr87007-4.c
@@ -15,4 +15,4 @@ foo (int n, int k)
   d1 = ceil (d3);
 }
 
-/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c 
b/gcc/testsuite/gcc.target/i386/pr87007-5.c
index 20d13cf650b..e4d956a5d7f 100644
--- a/gcc/testsuite/gcc.target/i386/pr87007-5.c
+++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c
@@ -15,4 +15,4 @@ foo (int n, int k)
   d1 = sqrt (d3);
 }
 
-/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 065d1bf3caf..e069f874f72 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -9853,15 +9853,6 @@ vectorizable_condition (vec_info *vinfo,
 {
   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
return false;
-
-  /* FORNOW: only supported as part of a reduction.  */
-  if (loop_vinfo && STMT_VINFO_LIVE_P (stmt_info))
-   {
- if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"value used after loop.\n");
- return false;
-   }
 }
 
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
-- 
2.26.2

[PATCH 1/2] aarch64: Add support for Armv8-R

2020-09-09 Thread Alex Coplan

Hello,

This patch adds support for Armv8-R AArch64 to GCC. It adds the -march
value armv8-r and sets the ACLE feature macro __ARM_ARCH_PROFILE
correctly when -march is set to armv8-r.

Testing:
 * Bootstrapped and regtested on aarch64-none-linux-gnu.
 * New unit test to check ACLE macro.

OK for master?

Thanks,
Alex

---

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.c
(aarch64_get_extension_string_for_isa_flags): Don't force +crc for
Armv8-R.
* config/aarch64/aarch64-arches.def: Add entry for Armv8-R.
* config/aarch64/aarch64-c.c (aarch64_define_unconditional_macros): Set
__ARM_ARCH_PROFILE correctly for Armv8-R.
* config/aarch64/aarch64.h (AARCH64_FL_V8_R): New.
(AARCH64_FL_FOR_ARCH8_R): New.
(AARCH64_ISA_V8_R): New.
* doc/invoke.texi: Add Armv8-R to architecture table.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/acle/armv8-r.c: New test.

---
 gcc/common/config/aarch64/aarch64-common.c  | 7 +--
 gcc/config/aarch64/aarch64-arches.def   | 1 +
 gcc/config/aarch64/aarch64-c.c  | 3 ++-
 gcc/config/aarch64/aarch64.h| 5 +
 gcc/doc/invoke.texi | 1 +
 gcc/testsuite/gcc.target/aarch64/acle/armv8-r.c | 6 ++
 6 files changed, 20 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/armv8-r.c
diff --git a/gcc/common/config/aarch64/aarch64-common.c b/gcc/common/config/aarch64/aarch64-common.c
index 51bd319d6d3..909006e6194 100644
--- a/gcc/common/config/aarch64/aarch64-common.c
+++ b/gcc/common/config/aarch64/aarch64-common.c
@@ -426,8 +426,11 @@ aarch64_get_extension_string_for_isa_flags (uint64_t isa_flags,
   names.  However as a special case if CRC was enabled before, always print
   it.  This is required because some CPUs have an incorrect specification
   in older assemblers.  Even though CRC should be the default for these
-  cases the -mcpu values won't turn it on.  */
-  if (isa_flags & AARCH64_ISA_CRC)
+  cases the -mcpu values won't turn it on.
+
+  Note that assemblers with Armv8-R AArch64 support should not have this
+  issue, so we don't need this fix when targeting Armv8-R.  */
+  if ((isa_flags & AARCH64_ISA_CRC) && !AARCH64_ISA_V8_R)
 isa_flag_bits |= AARCH64_ISA_CRC;
 
   /* Pass Two:
diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def
index 3be55fa29aa..389084f56c2 100644
--- a/gcc/config/aarch64/aarch64-arches.def
+++ b/gcc/config/aarch64/aarch64-arches.def
@@ -37,5 +37,6 @@ AARCH64_ARCH("armv8.3-a", generic,	 8_3A,	8,  AARCH64_FL_FOR_ARCH8_3)
 AARCH64_ARCH("armv8.4-a", generic,	 8_4A,	8,  AARCH64_FL_FOR_ARCH8_4)
 AARCH64_ARCH("armv8.5-a", generic,	 8_5A,	8,  AARCH64_FL_FOR_ARCH8_5)
 AARCH64_ARCH("armv8.6-a", generic,	 8_6A,	8,  AARCH64_FL_FOR_ARCH8_6)
+AARCH64_ARCH("armv8-r",   generic,	 8R  ,	8,  AARCH64_FL_FOR_ARCH8_R)
 
 #undef AARCH64_ARCH
diff --git a/gcc/config/aarch64/aarch64-c.c b/gcc/config/aarch64/aarch64-c.c
index fd08be47570..69691b3ad72 100644
--- a/gcc/config/aarch64/aarch64-c.c
+++ b/gcc/config/aarch64/aarch64-c.c
@@ -63,7 +63,8 @@ aarch64_define_unconditional_macros (cpp_reader *pfile)
  as interoperability with the same arm macro.  */
   builtin_define ("__ARM_ARCH_8A");
 
-  builtin_define_with_int_value ("__ARM_ARCH_PROFILE", 'A');
+  builtin_define_with_int_value ("__ARM_ARCH_PROFILE",
+  AARCH64_ISA_V8_R ? 'R' : 'A');
   builtin_define ("__ARM_FEATURE_CLZ");
   builtin_define ("__ARM_FEATURE_IDIV");
   builtin_define ("__ARM_FEATURE_UNALIGNED");
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index d3e89d1789a..12d43674197 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -161,6 +161,8 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_FL_LSE	  (1 << 4)  /* Has Large System Extensions.  */
 #define AARCH64_FL_RDMA   (1 << 5)  /* Has Round Double Multiply Add.  */
 #define AARCH64_FL_V8_1   (1 << 6)  /* Has ARMv8.1-A extensions.  */
+/* Armv8-R.  */
+#define AARCH64_FL_V8_R   (1 << 7)  /* Armv8-R AArch64.  */
 /* ARMv8.2-A architecture extensions.  */
 #define AARCH64_FL_V8_2   (1 << 8)  /* Has ARMv8.2-A features.  */
 #define AARCH64_FL_F16	  (1 << 9)  /* Has ARMv8.2-A FP16 extensions.  */
@@ -246,6 +248,8 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_FL_FOR_ARCH8_6			\
   (AARCH64_FL_FOR_ARCH8_5 | AARCH64_FL_V8_6 | AARCH64_FL_FPSIMD \
| AARCH64_FL_I8MM | AARCH64_FL_BF16)
+#define AARCH64_FL_FOR_ARCH8_R \
+  (AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_V8_R)
 
 /* Macros to test ISA flags.  */
 
@@ -282,6 +286,7 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_ISA_F64MM	   (aarch64_isa_flags & AARCH64_FL_F64MM)
 #define AARCH64_ISA_BF16	   (aarch64_isa_flags & AARCH64_FL_BF16)
 #define AARCH64_ISA_SB		   (aarc

[PATCH 2/2] aarch64: Add support for Cortex-R82

2020-09-09 Thread Alex Coplan

This patch adds support for Arm's Cortex-R82 CPU to GCC. For more
information about this CPU, see [0].

Testing:
 * Bootstrapped and regtested on aarch64-none-linux-gnu, no regressions.

[0] : https://developer.arm.com/ip-products/processors/cortex-r/cortex-r82

OK for trunk?

---

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def: Add Cortex-R82.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi: Add entry for Cortex-R82.

---
 gcc/config/aarch64/aarch64-cores.def | 3 +++
 gcc/config/aarch64/aarch64-tune.md   | 2 +-
 gcc/doc/invoke.texi  | 4 ++--
 3 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index a7dde38d768..f30ff35377c 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -150,4 +150,7 @@ AARCH64_CORE("cortex-a73.cortex-a53",  cortexa73cortexa53, cortexa53, 8A,  AARCH
 AARCH64_CORE("cortex-a75.cortex-a55",  cortexa75cortexa55, cortexa53, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, cortexa73, 0x41, AARCH64_BIG_LITTLE (0xd0a, 0xd05), -1)
 AARCH64_CORE("cortex-a76.cortex-a55",  cortexa76cortexa55, cortexa53, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, neoversen1, 0x41, AARCH64_BIG_LITTLE (0xd0b, 0xd05), -1)
 
+/* Armv8-R Architecture Processors.  */
+AARCH64_CORE("cortex-r82", cortexr82, cortexa53, 8R, AARCH64_FL_FOR_ARCH8_R, cortexa53, 0x41, 0xd15, -1)
+
 #undef AARCH64_CORE
diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md
index ebf97c38fbd..0e3239c670e 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-	"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa65,cortexa65ae,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,zeus,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55"
+	"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa65,cortexa65ae,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,zeus,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82"
 	(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ed18b67207d..a604710600e 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -17371,8 +17371,8 @@ performance of the code.  Permissible values for this option are:
 @samp{thunderxt83}, @samp{thunderx2t99}, @samp{thunderx3t110}, @samp{zeus},
 @samp{cortex-a57.cortex-a53}, @samp{cortex-a72.cortex-a53},
 @samp{cortex-a73.cortex-a35}, @samp{cortex-a73.cortex-a53},
-@samp{cortex-a75.cortex-a55}, @samp{cortex-a76.cortex-a55}
-@samp{native}.
+@samp{cortex-a75.cortex-a55}, @samp{cortex-a76.cortex-a55},
+@samp{cortex-r82}, @samp{native}.
 
 The values @samp{cortex-a57.cortex-a53}, @samp{cortex-a72.cortex-a53},
 @samp{cortex-a73.cortex-a35}, @samp{cortex-a73.cortex-a53},

[PATCH] enable live comparison vectorization

This removes a check preventing vectorization of live results of
vectorized comparisons.  I tested it with AVX512 mask registers
(inspecting assembly) and traditional vector masks.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

One alternative to live stmt vectorization is forcing of a
scalar epilouge btw, for the case where we generate such
(but not necessarily enter it) this might be the overall
cheaper way of dealing with live ops since it avoids
some live registers across the loop nests.

Richard.

2020-09-09  Richard Biener  

* tree-vect-stmts.c (vectorizable_comparison): Allow
STMT_VINFO_LIVE_P stmts.

* gcc.dg/vect/vect-live-6.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/vect-live-6.c | 31 +
 gcc/tree-vect-stmts.c   |  8 ---
 2 files changed, 31 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-live-6.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-6.c 
b/gcc/testsuite/gcc.dg/vect/vect-live-6.c
new file mode 100644
index 000..c986c97650f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-live-6.c
@@ -0,0 +1,31 @@
+#include "tree-vect.h"
+
+int a[1024];
+int b[1024];
+
+_Bool
+fn1 ()
+{
+  _Bool tem;
+  for (int i = 0; i < 1024; ++i)
+{
+  tem = !a[i];
+  b[i] = tem;
+}
+  return tem;
+}
+
+int main()
+{
+  check_vect ();
+  for (int i = 0; i < 1024; ++i)
+{
+  a[i] = i & 5;
+  __asm__ volatile ("" ::: "memory");
+}
+  if (fn1 () != !(1023 & 5) || b[2] != 1)
+abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target vect_int } 
} } */
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index e069f874f72..191957c3543 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -10319,14 +10319,6 @@ vectorizable_comparison (vec_info *vinfo,
   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
 return false;
 
-  if (loop_vinfo && STMT_VINFO_LIVE_P (stmt_info))
-{
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"value used after loop.\n");
-  return false;
-}
-
   gassign *stmt = dyn_cast  (stmt_info->stmt);
   if (!stmt)
 return false;
-- 
2.26.2

Re: [Patch] Fortran: Fixes for OpenMP loop-iter privatization (PRs 95109 + 94690)


On 9/9/20 8:54 AM, Jakub Jelinek wrote:


On Tue, Sep 08, 2020 at 12:22:57PM +0200, Tobias Burnus wrote:

gcc/testsuite/ChangeLog:

 PR fortran/95109
 PR fortran/94690
 * gfortran.dg/gomp/combined-if.f90: Update scan-tree-dump-times for
 'omp simd.*if'.
 * gfortran.dg/gomp/openmp-simd-5.f90: New test.


I have applied a follow-up commit for nvptx as the scan times increased
– and are/were different (with -O1 and higher; the testsuite uses "-O".)


LGTM, thanks.
Note for OpenMP 5.0 we'll need also
EXEC_OMP_{,PARALLEL_}MASTER_TASKLOOP{,_SIMD},
EXEC_OMP_PARALLEL_{LOOP,MASTER}
EXEC_OMP_TARGET_{PARALLEL,TEAMS}_LOOP
EXEC_OMP_TEAMS_LOOP
handling for the combined constructs in these.


Indeed. – 'omp loop' that's one of the larger features missing
from gfortran, which are in already supported in C/C++.

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
commit 357e04046cdb0c27aaa2c6673c913b75dd454daa
Author: Tobias Burnus 
Date:   Wed Sep 9 11:44:55 2020 +0200

gfortran.dg/gomp/combined-if.f90: Update nvptx tree-dump times

nvptx has additional omp simd lines with _simt_ with -O1 and higher.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/combined-if.f90: Update scan-tree-dump-times for
'omp simd.*if' for nvptx even more.

diff --git a/gcc/testsuite/gfortran.dg/gomp/combined-if.f90 b/gcc/testsuite/gfortran.dg/gomp/combined-if.f90
index d9e4a26ca0c..003821289a6 100644
--- a/gcc/testsuite/gfortran.dg/gomp/combined-if.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/combined-if.f90
@@ -105,5 +105,5 @@ end module
 
 ! { dg-final { scan-tree-dump-times "(?n)#pragma omp target.* if\\(" 9 "omplower" } }
 ! { dg-final { scan-tree-dump-times "(?n)#pragma omp simd.* if\\(" 5 "omplower" { target { ! offload_nvptx } } } }
-! { dg-final { scan-tree-dump-times "(?n)#pragma omp simd.* if\\(" 7 "omplower" { target { offload_nvptx } } } }
+! { dg-final { scan-tree-dump-times "(?n)#pragma omp simd.* if\\(" 9 "omplower" { target { offload_nvptx } } } }
 ! { dg-final { scan-tree-dump-times "(?n)#pragma omp parallel.* if\\(" 6 "omplower" } }
commit f93eba8c5fde83100bf0854607848b6f50c8bbb2
Author: Tobias Burnus 
Date:   Wed Sep 9 11:54:43 2020 +0200

Fortran: Fixes for OpenMP loop-iter privatization (PRs 95109 + 94690)

This commit also fixes a gfortran.dg/gomp/target1.f90 regression;
target1.f90 tests the resolve.c and openmp.c changes.

gcc/fortran/ChangeLog:

PR fortran/95109
PR fortran/94690
* resolve.c (gfc_resolve_code): Also call
gfc_resolve_omp_parallel_blocks for 'distribute parallel do (simd)'.
* openmp.c (gfc_resolve_omp_parallel_blocks): Handle it.
* trans-openmp.c (gfc_trans_omp_target): For TARGET_PARALLEL_DO_SIMD,
call simd not do processing function.

gcc/testsuite/ChangeLog:

PR fortran/95109
PR fortran/94690
* gfortran.dg/gomp/openmp-simd-5.f90: New test.

(cherry picked from commit 61c2d476a52bb108bd05d0226c5522bf0c4b24b5)
---
 gcc/fortran/openmp.c |  2 ++
 gcc/fortran/resolve.c|  2 ++
 gcc/fortran/trans-openmp.c   |  8 +++-
 gcc/testsuite/gfortran.dg/gomp/openmp-simd-5.f90 | 24 
 4 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 930bca541b9..4f472dbc936 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -5597,6 +5597,8 @@ gfc_resolve_omp_parallel_blocks (gfc_code *code, gfc_namespace *ns)
 
   switch (code->op)
 {
+case EXEC_OMP_DISTRIBUTE_PARALLEL_DO:
+case EXEC_OMP_DISTRIBUTE_PARALLEL_DO_SIMD:
 case EXEC_OMP_PARALLEL_DO:
 case EXEC_OMP_PARALLEL_DO_SIMD:
 case EXEC_OMP_TARGET_PARALLEL_DO:
diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index 533738b0b38..c650096df6d 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -11690,6 +11690,8 @@ gfc_resolve_code (gfc_code *code, gfc_namespace *ns)
 	  omp_workshare_flag = 1;
 	  gfc_resolve_omp_parallel_blocks (code, ns);
 	  break;
+	case EXEC_OMP_DISTRIBUTE_PARALLEL_DO:
+	case EXEC_OMP_DISTRIBUTE_PARALLEL_DO_SIMD:
 	case EXEC_OMP_PARALLEL:
 	case EXEC_OMP_PARALLEL_DO:
 	case EXEC_OMP_PARALLEL_DO_SIMD:
diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index d99672742df..c01c4d79219 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -5341,13 +5341,19 @@ gfc_trans_omp_target (gfc_code *code)
   }
   break;
 case EXEC_OMP_TARGET_PARALLEL_DO:
-case EXEC_OMP_TARGET_PARALLEL_DO_SIMD:
   stmt = gfc_trans_omp_parallel_do (code, &block, clausesa);
   if (TREE_CODE (stmt) != BIND_EXPR)
 	stmt = buil

[PATCH v3] doc: change 'make check-g++' to 'make check-c++' in install.texi

2020-09-09 Thread Hu Jiangping

This patch check the command 'make check-g++' to 'make check-c++' in
install.texi since there is no 'make check-g++' target in the object
directory. It also adds some description in the above text, to clarity
and emphasis the difference of the 'make check-' targets in
between object directory and the gcc subdirectory.

The earlier patch discussion is as follows:
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552761.html
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553307.html

Ok for master?

Regards!
Hujp

---
 gcc/doc/install.texi | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 5330bf3bb29..0a6a3271757 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -2964,10 +2964,13 @@ on a simulator as described at 
@uref{http://gcc.gnu.org/simtest-howto.html}.
 
 In order to run sets of tests selectively, there are targets
 @samp{make check-gcc} and language specific @samp{make check-c},
-@samp{make check-c++}, @samp{make check-d} @samp{make check-fortran},
+@samp{make check-c++}, @samp{make check-d}, @samp{make check-fortran},
 @samp{make check-ada}, @samp{make check-objc}, @samp{make check-obj-c++},
 @samp{make check-lto}
-in the @file{gcc} subdirectory of the object directory.  You can also
+in both the object directory and the @file{gcc} subdirectory of the
+object directory.  Note these targets in the object directory will do
+both the language tests and the related library tests while in the @file{gcc}
+subdirectory of the object directory will do only language tests.  You can also
 just run @samp{make check} in a subdirectory of the object directory.
 
 
@@ -2978,11 +2981,11 @@ testsuite is to use
 make check-gcc RUNTESTFLAGS="execute.exp @var{other-options}"
 @end smallexample
 
-Likewise, in order to run only the @command{g++} ``old-deja'' tests in
+Likewise, in order to run only the @command{c++} ``old-deja'' tests in
 the testsuite with filenames matching @samp{9805*}, you would use
 
 @smallexample
-make check-g++ RUNTESTFLAGS="old-deja.exp=9805* @var{other-options}"
+make check-c++ RUNTESTFLAGS="old-deja.exp=9805* @var{other-options}"
 @end smallexample
 
 The file-matching expression following @var{filename}@command{.exp=} is treated
@@ -2991,8 +2994,8 @@ may be passed, although any whitespace must either be 
escaped or surrounded by
 single quotes if multiple expressions are desired. For example,
 
 @smallexample
-make check-g++ RUNTESTFLAGS="old-deja.exp=9805*\ virtual2.c 
@var{other-options}"
-make check-g++ RUNTESTFLAGS="'old-deja.exp=9805* virtual2.c' 
@var{other-options}"
+@dots{}"old-deja.exp=9805*\ virtual2.c @var{other-options}"
+@dots{}"'old-deja.exp=9805* virtual2.c' @var{other-options}"
 @end smallexample
 
 The @file{*.exp} files are located in the testsuite directories of the GCC
@@ -3010,10 +3013,10 @@ You can pass multiple options to the testsuite using the
 work outside the makefiles.  For example,
 
 @smallexample
-make check-g++ RUNTESTFLAGS="--target_board=unix/-O3/-fmerge-constants"
+@dots{}"--target_board=unix/-O3/-fmerge-constants"
 @end smallexample
 
-will run the standard @command{g++} testsuites (``unix'' is the target name
+will run the standard @command{c++} testsuites (``unix'' is the target name
 for a standard native testsuite situation), passing
 @samp{-O3 -fmerge-constants} to the compiler on every test, i.e.,
 slashes separate options.
-- 
2.17.1

RE: [PATCH 2/2] aarch64: Add support for Cortex-R82

2020-09-09 Thread Kyrylo Tkachov



> -Original Message-
> From: Alex Coplan 
> Sent: 09 September 2020 11:15
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Earnshaw ; Richard Sandiford
> ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Subject: [PATCH 2/2] aarch64: Add support for Cortex-R82
> 
> This patch adds support for Arm's Cortex-R82 CPU to GCC. For more
> information about this CPU, see [0].
> 
> Testing:
>  * Bootstrapped and regtested on aarch64-none-linux-gnu, no regressions.
> 
> [0] : https://developer.arm.com/ip-products/processors/cortex-r/cortex-r82
> 
> OK for trunk?

Ok.
Thanks,
Kyrill

> 
> ---
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-cores.def: Add Cortex-R82.
>   * config/aarch64/aarch64-tune.md: Regenerate.
>   * doc/invoke.texi: Add entry for Cortex-R82.
> 
> ---
>  gcc/config/aarch64/aarch64-cores.def | 3 +++
>  gcc/config/aarch64/aarch64-tune.md   | 2 +-
>  gcc/doc/invoke.texi  | 4 ++--
>  3 files changed, 6 insertions(+), 3 deletions(-)

RE: [PATCH 1/2] aarch64: Add support for Armv8-R

2020-09-09 Thread Kyrylo Tkachov

Hi Alex,

> -Original Message-
> From: Alex Coplan 
> Sent: 09 September 2020 11:15
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Earnshaw ; Richard Sandiford
> ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Subject: [PATCH 1/2] aarch64: Add support for Armv8-R
> 
> Hello,
> 
> This patch adds support for Armv8-R AArch64 to GCC. It adds the -march
> value armv8-r and sets the ACLE feature macro __ARM_ARCH_PROFILE
> correctly when -march is set to armv8-r.
> 
> Testing:
>  * Bootstrapped and regtested on aarch64-none-linux-gnu.
>  * New unit test to check ACLE macro.
> 
> OK for master?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> Alex
> 
> ---
> 
> gcc/ChangeLog:
> 
>   * common/config/aarch64/aarch64-common.c
>   (aarch64_get_extension_string_for_isa_flags): Don't force +crc for
>   Armv8-R.
>   * config/aarch64/aarch64-arches.def: Add entry for Armv8-R.
>   * config/aarch64/aarch64-c.c
> (aarch64_define_unconditional_macros): Set
>   __ARM_ARCH_PROFILE correctly for Armv8-R.
>   * config/aarch64/aarch64.h (AARCH64_FL_V8_R): New.
>   (AARCH64_FL_FOR_ARCH8_R): New.
>   (AARCH64_ISA_V8_R): New.
>   * doc/invoke.texi: Add Armv8-R to architecture table.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/acle/armv8-r.c: New test.
> 
> ---
>  gcc/common/config/aarch64/aarch64-common.c  | 7 +--
>  gcc/config/aarch64/aarch64-arches.def   | 1 +
>  gcc/config/aarch64/aarch64-c.c  | 3 ++-
>  gcc/config/aarch64/aarch64.h| 5 +
>  gcc/doc/invoke.texi | 1 +
>  gcc/testsuite/gcc.target/aarch64/acle/armv8-r.c | 6 ++
>  6 files changed, 20 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/armv8-r.c

[PATCH] MSP430: Fix CFA generation during function epilogues

2020-09-09 Thread Jozef Lawrynowicz

There is no CFA information generated for instructions which manipulate the
stack during function epilogues. This means a debugger cannot determine the
position of variables on the stack whilst the epilogue is in progress.

This can cause the debugger to give erroneous information when printing a
backtrace whilst stepping through the epilogue, or cause software watchpoints
set on stack variables to become invalidated after a function epilogue
is executed.

The patch fixes this by marking stack manipulation insns as
frame_related, and adding reg_note RTXs to stack pop instructions in the
epilogue.

Successfully regtested on trunk for msp430-elf in the default, -mlarge,
-mcpu=msp430 and -mlarge/-mcode-region=either/-mdata-region=either
configurations.

This fixes some tests from watchpoint.exp in the GDB testsuite.

Ok for trunk?

Thanks,
Jozef
>From 272b38a374eddf7327a61ff9b1730f0a2dd40233 Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Mon, 7 Sep 2020 20:34:40 +0100
Subject: [PATCH] MSP430: Fix CFA generation during function epilogues

There is no CFA information generated for instructions which manipulate the
stack during function epilogues. This means a debugger cannot determine the
position of variables on the stack whilst the epilogue is in progress.

This can cause the debugger to give erroneous information when printing a
backtrace whilst stepping through the epilogue, or cause software watchpoints
set on stack variables to become invalidated after a function epilogue
is executed.

The patch fixes this by marking stack manipulation insns as
frame_related, and adding reg_note RTXs to stack pop instructions in the
epilogue.

gcc/ChangeLog:

* config/msp430/msp430.c (increment_stack): Mark insns which increment
the stack as frame_related.
(msp430_expand_prologue): Add comments.
(msp430_expand_epilogue): Mark insns which decrement
the stack as frame_related.
Add reg_note to stack pop insns describing position of register
variables on the stack.

---
 gcc/config/msp430/msp430.c | 72 +++---
 1 file changed, 59 insertions(+), 13 deletions(-)

diff --git a/gcc/config/msp430/msp430.c b/gcc/config/msp430/msp430.c
index 129b916715e..1cb1b8f8626 100644
--- a/gcc/config/msp430/msp430.c
+++ b/gcc/config/msp430/msp430.c
@@ -1700,9 +1700,9 @@ increment_stack (HOST_WIDE_INT amount)
 {
   inc = GEN_INT (amount);
   if (TARGET_LARGE)
-   emit_insn (gen_addpsi3 (sp, sp, inc));
+   F (emit_insn (gen_addpsi3 (sp, sp, inc)));
   else
-   emit_insn (gen_addhi3 (sp, sp, inc));
+   F (emit_insn (gen_addhi3 (sp, sp, inc)));
 }
 }
 
@@ -2413,6 +2413,8 @@ msp430_expand_prologue (void)
   for (i = 15; i >= 4; i--)
 if (cfun->machine->need_to_save[i])
   {
+   /* We need to save COUNT sequential registers starting from regnum
+  I.  */
int seq, count;
rtx note;
 
@@ -2427,6 +2429,7 @@ msp430_expand_prologue (void)
p = F (emit_insn (gen_pushm (gen_rtx_REG (Pmode, i),
 GEN_INT (count;
 
+   /* Document the stack decrement as a result of PUSHM.  */
note = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (count + 1));
 
XVECEXP (note, 0, 0)
@@ -2475,8 +2478,10 @@ msp430_expand_prologue (void)
 void
 msp430_expand_epilogue (int is_eh)
 {
-  int i;
+  int i, j;
   int fs;
+  rtx sp = stack_pointer_rtx;
+  rtx p;
   int helper_n = 0;
 
   if (is_naked_func ())
@@ -2545,19 +2550,27 @@ msp430_expand_epilogue (int is_eh)
   for (i = 4; i <= 15; i++)
 if (cfun->machine->need_to_save[i])
   {
-   int seq, count;
+   /* We need to restore COUNT sequential registers starting from regnum
+  I.  */
+   int seq;
+   int count = 1;
+   int helper_used = 0;
+   rtx note, addr;
 
-   for (seq = i + 1; seq <= 15 && cfun->machine->need_to_save[seq]; seq ++)
- ;
-   count = seq - i;
+   if (msp430x)
+ {
+   for (seq = i + 1; seq <= 15 && cfun->machine->need_to_save[seq];
+seq++)
+ ;
+   count = seq - i;
+ }
 
if (msp430x)
  {
/* Note: With TARGET_LARGE we still use
   POPM as POPX.A is two bytes bigger.  */
-   emit_insn (gen_popm (stack_pointer_rtx, GEN_INT (seq - 1),
-GEN_INT (count)));
-   i += count - 1;
+   p = F (emit_insn (gen_popm (stack_pointer_rtx, GEN_INT (seq - 1),
+   GEN_INT (count;
  }
else if (i == 11 - helper_n
 && ! msp430_is_interrupt_func ()
@@ -2569,11 +2582,44 @@ msp430_expand_epilogue (int is_eh)
 && helper_n > 1
 && !is_eh)
  {
-   emit_jump_insn (gen_epilogue_helper (GEN_INT (helper_n)));
-   return;
+   p = F (emit_jump_insn (gen_epilogue_h

[OG10] Merge GCC into branch


OG10 = devel/omp/gcc-10

I have merged releases/gcc-10 into that branch.

This included the
'Fortran: Fixes for OpenMP loop-iter privatization (PRs 95109 + 94690)'
commit, which differs between mainline and GCC 10 – and OG10 is closer
to mainline. Hence, the attached patch was committed as follow-up patch
to add the missing bits.

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
commit b329c58936ebbceb845e557081b3a1d0243b0794
Author: Tobias Burnus 
Date:   Wed Sep 9 12:06:54 2020 +0200

Fortran: Fixes for OpenMP loop-iter privatization (PRs 95109 + 94690)

Contains those parts of the mainline commit
61c2d476a52bb108bd05d0226c5522bf0c4b24b5 which are not in GCC 10 and,
hence, not merged in f93eba8c5fde83100bf0854607848b6f50c8bbb2, i.e. the
additional openmp.c change and the dump change for !nvptx.

And the follow-up commit cbc12c582462c720adccef5097b3162cc77c37a2 which does:
gfortran.dg/gomp/combined-if.f90: Update nvptx tree-dump times as
nvptx has additional omp simd lines with _simt_ with -O1 and higher.

gcc/fortran/ChangeLog:

PR fortran/95109
PR fortran/94690
* openmp.c (gfc_resolve_do_iterator): Remove special code for SIMD,
which is not needed.

gcc/testsuite/ChangeLog:

PR fortran/95109
PR fortran/94690
* gfortran.dg/gomp/combined-if.f90: Update scan-tree-dump-times for
'omp simd.*if'.

(cherry picked from commit cbc12c582462c720adccef5097b3162cc77c37a2)
---
 gcc/fortran/ChangeLog.omp  |  8 
 gcc/fortran/openmp.c   | 25 -
 gcc/testsuite/ChangeLog.omp|  8 
 gcc/testsuite/gfortran.dg/gomp/combined-if.f90 |  4 ++--
 4 files changed, 18 insertions(+), 27 deletions(-)

diff --git a/gcc/fortran/ChangeLog.omp b/gcc/fortran/ChangeLog.omp
index e5465e0eb63..986eb7d31f2 100644
--- a/gcc/fortran/ChangeLog.omp
+++ b/gcc/fortran/ChangeLog.omp
@@ -1,3 +1,11 @@
+2020-09-09  Tobias Burnus  
+
+	Backport from mainline
+	2020-09-09  Tobias Burnus  
+
+	* openmp.c (gfc_resolve_do_iterator): Remove special code
+	for SIMD, which is not needed.
+
 2020-09-01  Tobias Burnus  
 
 	Backport from mainline
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 2626730127b..334125e7f9a 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -6064,31 +6064,6 @@ gfc_resolve_do_iterator (gfc_code *code, gfc_symbol *sym, bool add_clause)
   if (omp_current_ctx->sharing_clauses->contains (sym))
 return;
 
-  if (omp_current_ctx->is_openmp && omp_current_ctx->code->block)
-{
-  /* SIMD is handled differently and, hence, ignored here.  */
-  gfc_code *omp_code = omp_current_ctx->code->block;
-  for ( ; omp_code->next; omp_code = omp_code->next)
-	switch (omp_code->op)
-	  {
-	  case EXEC_OMP_SIMD:
-	  case EXEC_OMP_DO_SIMD:
-	  case EXEC_OMP_PARALLEL_DO_SIMD:
-	  case EXEC_OMP_DISTRIBUTE_SIMD:
-	  case EXEC_OMP_DISTRIBUTE_PARALLEL_DO_SIMD:
-	  case EXEC_OMP_TEAMS_DISTRIBUTE_SIMD:
-	  case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE_SIMD:
-	  case EXEC_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD:
-	  case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD:
-	  case EXEC_OMP_TARGET_PARALLEL_DO_SIMD:
-	  case EXEC_OMP_TARGET_SIMD:
-	  case EXEC_OMP_TASKLOOP_SIMD:
-	return;
-	  default:
-	break;
-	  }
-}
-
   if (! omp_current_ctx->private_iterators->add (sym) && add_clause)
 {
   gfc_omp_clauses *omp_clauses = omp_current_ctx->code->ext.omp_clauses;
diff --git a/gcc/testsuite/ChangeLog.omp b/gcc/testsuite/ChangeLog.omp
index 132a4e7feba..d3e69738c1d 100644
--- a/gcc/testsuite/ChangeLog.omp
+++ b/gcc/testsuite/ChangeLog.omp
@@ -1,3 +1,11 @@
+2020-09-09  Tobias Burnus  
+
+	Backport from mainline
+	2020-09-09  Tobias Burnus  
+
+	* gfortran.dg/gomp/combined-if.f90: Update scan-tree-dump-times
+	for 'omp simd.*if'.
+
 2020-09-01  Tobias Burnus  
 
 	Backport from mainline
diff --git a/gcc/testsuite/gfortran.dg/gomp/combined-if.f90 b/gcc/testsuite/gfortran.dg/gomp/combined-if.f90
index 0bb6c28b286..003821289a6 100644
--- a/gcc/testsuite/gfortran.dg/gomp/combined-if.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/combined-if.f90
@@ -104,6 +104,6 @@ contains
 end module
 
 ! { dg-final { scan-tree-dump-times "(?n)#pragma omp target.* if\\(" 9 "omplower" } }
-! { dg-final { scan-tree-dump-times "(?n)#pragma omp simd.* if\\(" 4 "omplower" { target { ! offload_nvptx } } } }
-! { dg-final { scan-tree-dump-times "(?n)#pragma omp simd.* if\\(" 7 "omplower" { target { offload_nvptx } } } }
+! { dg-final { scan-tree-dump-times "(?n)#pragma omp simd.* if\\(" 5 "omplower" { target { ! offload_nvptx } } } }
+! { dg-final { scan-tree-dump-times "(?n)#pragma omp simd.* if\\(" 9 "omplo

[committed][nvptx] Fix boolean type test in write_fn_proto

Hi,

When running this libgomp testcase for nvptx accelerator:
...
/* { dg-do run } */
__uint128_t v;
int main () {
  #pragma omp target
  {
__uint128_t exp = 2;
__atomic_compare_exchange_n (&v, &exp, 7, false, __ATOMIC_RELEASE,
 __ATOMIC_ACQUIRE);
  }
}
...
we run into this assert in write_fn_proto:
...
913 gcc_assert (type == boolean_type_node);
...

This happens when doing some special-handling code for
__atomic_compare_exchange_1/2/4/8/16.  The function decls have a parameter
called weak of type bool, which is skipped when writing the decl because
the corresponding libatomic functions do not have that parameter.  The assert
is there to verify that we skip the correct parameter.

However, we assert because we have different type of bools:
...
(gdb) call debug_generic_expr (type)
_Bool
(gdb) call debug_generic_expr (global_trees[TI_BOOLEAN_TYPE])
bool
...

Fix this by checking for TREE_CODE (type) == BOOLEAN_TYPE instead.

Tested libgomp on x86_64-linux with nvptx accelerator.

Likewise, tested that the test-case above does not ICE anymore.

Committed to trunk.

Thanks,
- Tom

[nvptx] Fix boolean type test in write_fn_proto

gcc/ChangeLog:

PR target/96991
* config/nvptx/nvptx.c (write_fn_proto): Fix boolean type check.

---
 gcc/config/nvptx/nvptx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 39d0275493a..6f393dfea01 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -910,7 +910,7 @@ write_fn_proto (std::stringstream &s, bool is_defn,
   if (not_atomic_weak_arg)
argno = write_arg_type (s, -1, argno, type, prototyped);
   else
-   gcc_assert (type == boolean_type_node);
+   gcc_assert (TREE_CODE (type) == BOOLEAN_TYPE);
 }
 
   if (stdarg_p (fntype))

Re: [PATCH][libatomic] Add nvptx support


Hi Tom,

On 9/8/20 5:05 PM, Tobias Burnus wrote:


On 9/8/20 8:51 AM, Tom de Vries wrote:

PR target/96964
* config/nvptx/nvptx.md (define_expand "atomic_test_and_set"): New
expansion.
* sync-builtins.def (BUILT_IN_ATOMIC_TEST_AND_SET_1): New builtin.


I have your patch applied on a current mainline powerpc64le-none-linux-gnu
+ nvptx offloading build. And I observe the following fails – which seems
to be new and related to your patch (but I have not confirmed it by
reverting your libatomic patch).

Required option for the fail: "-O2 -ftracer",
hence, only the "-O3 ..." testsuite builds fail.
(-ftracer = "Perform tail duplication to enlarge superblock size.")


during RTL pass: mach
asyncwait-1.f90:19: internal compiler error: in nvptx_find_par, at 
config/nvptx/nvptx.c:3293
0x10bf9f13 nvptx_find_par
gcc/config/nvptx/nvptx.c:3293
0x10bf9b97 nvptx_find_par
gcc/config/nvptx/nvptx.c:3320
0x10bf9b97 nvptx_find_par
gcc/config/nvptx/nvptx.c:3320
...


The ICE occurs for the second assert of:
case CODE_FOR_nvptx_join:
  /* A loop tail.  Finish the current loop and return to
 parent.  */
  {
unsigned mask = UINTVAL (XVECEXP (PATTERN (end), 0, 0));

gcc_assert (par->mask == mask);
gcc_assert (par->join_block == NULL);

gdb shows:
(gdb) p debug_bb(par->join_block )
(note 213 30 31 24 [bb 24] NOTE_INSN_BASIC_BLOCK)
(insn 31 213 204 24 (unspec_volatile:SI [
(const_int 4 [0x4])
] UNSPECV_JOIN) 
"libgomp/testsuite/libgomp.oacc-fortran/deep-copy-8.f90":24:0 237 {nvptx_join}
 (nil))
(jump_insn 204 31 205 24 (set (pc)
(label_ref 198)) 121 {jump}
 (nil)
 -> 198)


That affects the testcases:
libgomp.oacc-fortran/asyncwait-1.f90
libgomp.oacc-fortran/asyncwait-2.f90
libgomp.oacc-fortran/asyncwait-3.f90
libgomp.oacc-fortran/atomic_capture-1.f90
libgomp.oacc-fortran/atomic_update-1.f90
libgomp.oacc-fortran/classtypes-1.f95
libgomp.oacc-fortran/collapse-1.f90
libgomp.oacc-fortran/collapse-2.f90
libgomp.oacc-fortran/collapse-3.f90
libgomp.oacc-fortran/collapse-4.f90
libgomp.oacc-fortran/collapse-5.f90
libgomp.oacc-fortran/collapse-6.f90
libgomp.oacc-fortran/collapse-7.f90
libgomp.oacc-fortran/collapse-8.f90
libgomp.oacc-fortran/combined-directives-1.f90
libgomp.oacc-fortran/combined-reduction.f90
libgomp.oacc-fortran/common-block-1.f90
libgomp.oacc-fortran/common-block-2.f90
libgomp.oacc-fortran/common-block-3.f90
libgomp.oacc-fortran/deep-copy-1.f90
libgomp.oacc-fortran/deep-copy-3.f90
libgomp.oacc-fortran/deep-copy-4.f90
libgomp.oacc-fortran/deep-copy-5.f90
libgomp.oacc-fortran/deep-copy-6-no_finalize.F90
libgomp.oacc-fortran/deep-copy-6.f90
libgomp.oacc-fortran/deep-copy-7.f90
libgomp.oacc-fortran/deep-copy-8.f90
libgomp.oacc-fortran/derived-type-1.f90
libgomp.oacc-fortran/host_data-2.f90
libgomp.oacc-fortran/host_data-3.f
libgomp.oacc-fortran/host_data-4.f90
libgomp.oacc-fortran/implicit-firstprivate-ref.f90
libgomp.oacc-fortran/lib-14.f90
libgomp.oacc-fortran/map-1.f90
libgomp.oacc-fortran/nested-function-1.f90
libgomp.oacc-fortran/nested-function-2.f90
libgomp.oacc-fortran/nested-function-3.f90
libgomp.oacc-fortran/no_create-3.F90
libgomp.oacc-fortran/optional-data-copyin.f90
libgomp.oacc-fortran/optional-data-copyout.f90
libgomp.oacc-fortran/optional-data-enter-exit.f90
libgomp.oacc-fortran/optional-declare.f90
libgomp.oacc-fortran/optional-firstprivate.f90
libgomp.oacc-fortran/optional-reduction.f90
libgomp.oacc-fortran/optional-update-device.f90
libgomp.oacc-fortran/optional-update-host.f90
libgomp.oacc-fortran/parallel-dims.f90
libgomp.oacc-fortran/parallel-loop-1.f90
libgomp.oacc-fortran/pr81352.f90
libgomp.oacc-fortran/pr84028.f90
libgomp.oacc-fortran/reduction-1.f90
libgomp.oacc-fortran/reduction-2.f90
libgomp.oacc-fortran/reduction-3.f90
libgomp.oacc-fortran/reduction-4.f90
libgomp.oacc-fortran/reduction-5.f90
libgomp.oacc-fortran/reduction-6.f90
libgomp.oacc-fortran/reduction-7.f90
libgomp.oacc-fortran/reduction-8.f90
libgomp.oacc-fortran/routine-1.f90
libgomp.oacc-fortran/routine-2.f90
libgomp.oacc-fortran/routine-3.f90
libgomp.oacc-fortran/routine-4.f90
libgomp.oacc-fortran/routine-7.f90
libgomp.oacc-fortran/routine-9.f90
libgomp.oacc-fortran/subarrays-1.f90
libgomp.oacc-fortran/subarrays-2.f90
libgomp.oacc-fortran/update-2.f90

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter

[PATCH] fix useless unsharing of SLP tree

This avoids unsharing the SLP tree when optimizing load permutations
for reductions but there is no actual permute taking place.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2020-09-09  Richard Biener  

* tree-vect-slp.c (vect_attempt_slp_rearrange_stmts): Do
nothing when the permutation doesn't permute.
---
 gcc/tree-vect-slp.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 03b11058bd5..15d57890b6f 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -1905,11 +1905,14 @@ vect_attempt_slp_rearrange_stmts (slp_instance 
slp_instn)
 }
 
   /* Check that the loads in the first sequence are different and there
- are no gaps between them.  */
+ are no gaps between them and that there is an actual permutation.  */
+  bool any_permute = false;
   auto_sbitmap load_index (group_size);
   bitmap_clear (load_index);
   FOR_EACH_VEC_ELT (node->load_permutation, i, lidx)
 {
+  if (lidx != i)
+   any_permute = true;
   if (lidx >= group_size)
return false;
   if (bitmap_bit_p (load_index, lidx))
@@ -1917,6 +1920,8 @@ vect_attempt_slp_rearrange_stmts (slp_instance slp_instn)
 
   bitmap_set_bit (load_index, lidx);
 }
+  if (!any_permute)
+return false;
   for (i = 0; i < group_size; i++)
 if (!bitmap_bit_p (load_index, i))
   return false;
-- 
2.26.2

Re: [PATCH][libatomic] Add nvptx support

On 9/9/20 2:36 PM, Tobias Burnus wrote:
> Hi Tom,
> 
> On 9/8/20 5:05 PM, Tobias Burnus wrote:
> 
>> On 9/8/20 8:51 AM, Tom de Vries wrote:
>>>     PR target/96964
>>>     * config/nvptx/nvptx.md (define_expand "atomic_test_and_set"): New
>>>     expansion.
>>>     * sync-builtins.def (BUILT_IN_ATOMIC_TEST_AND_SET_1): New builtin.
> 
> I have your patch applied on a current mainline powerpc64le-none-linux-gnu
> + nvptx offloading build.

Thanks for trying this out.

> And I observe the following fails – which seems
> to be new and related to your patch (but I have not confirmed it by
> reverting your libatomic patch).
> 

Could you confirm that?

Meanwhile, I'll try to reproduce on x86_64.

> Required option for the fail: "-O2 -ftracer",
> hence, only the "-O3 ..." testsuite builds fail.
> (-ftracer = "Perform tail duplication to enlarge superblock size.")
> 
> 
> during RTL pass: mach
> asyncwait-1.f90:19: internal compiler error: in nvptx_find_par, at
> config/nvptx/nvptx.c:3293
> 0x10bf9f13 nvptx_find_par
>     gcc/config/nvptx/nvptx.c:3293
> 0x10bf9b97 nvptx_find_par
>     gcc/config/nvptx/nvptx.c:3320
> 0x10bf9b97 nvptx_find_par
>     gcc/config/nvptx/nvptx.c:3320
> ...
> 
> 
> The ICE occurs for the second assert of:
>     case CODE_FOR_nvptx_join:
>   /* A loop tail.  Finish the current loop and return to
>  parent.  */
>   {
>     unsigned mask = UINTVAL (XVECEXP (PATTERN (end), 0, 0));
> 
>     gcc_assert (par->mask == mask);
>     gcc_assert (par->join_block == NULL);
> 
> gdb shows:
> (gdb) p debug_bb(par->join_block )
> (note 213 30 31 24 [bb 24] NOTE_INSN_BASIC_BLOCK)
> (insn 31 213 204 24 (unspec_volatile:SI [
>     (const_int 4 [0x4])
>     ] UNSPECV_JOIN)
> "libgomp/testsuite/libgomp.oacc-fortran/deep-copy-8.f90":24:0 237
> {nvptx_join}
>  (nil))
> (jump_insn 204 31 205 24 (set (pc)
>     (label_ref 198)) 121 {jump}
>  (nil)
>  -> 198)
> 

Yep, code duplication works against the matching of fork/join, it's not
the first time we see this.

Usually the fix is to make an optimization pass conservative with
respect to these fork/join regions, but AFAICT, ftracer already has such
code in ignore_bb_p that tests gimple_call_internal_unique_p.

So, perhaps the ftracer pass is the trigger, but not the pass that does
the problematic transformation? Just a guess at this point.

Thanks,
- Tom

> 
> That affects the testcases:
> libgomp.oacc-fortran/asyncwait-1.f90
> libgomp.oacc-fortran/asyncwait-2.f90
> libgomp.oacc-fortran/asyncwait-3.f90
> libgomp.oacc-fortran/atomic_capture-1.f90
> libgomp.oacc-fortran/atomic_update-1.f90
> libgomp.oacc-fortran/classtypes-1.f95
> libgomp.oacc-fortran/collapse-1.f90
> libgomp.oacc-fortran/collapse-2.f90
> libgomp.oacc-fortran/collapse-3.f90
> libgomp.oacc-fortran/collapse-4.f90
> libgomp.oacc-fortran/collapse-5.f90
> libgomp.oacc-fortran/collapse-6.f90
> libgomp.oacc-fortran/collapse-7.f90
> libgomp.oacc-fortran/collapse-8.f90
> libgomp.oacc-fortran/combined-directives-1.f90
> libgomp.oacc-fortran/combined-reduction.f90
> libgomp.oacc-fortran/common-block-1.f90
> libgomp.oacc-fortran/common-block-2.f90
> libgomp.oacc-fortran/common-block-3.f90
> libgomp.oacc-fortran/deep-copy-1.f90
> libgomp.oacc-fortran/deep-copy-3.f90
> libgomp.oacc-fortran/deep-copy-4.f90
> libgomp.oacc-fortran/deep-copy-5.f90
> libgomp.oacc-fortran/deep-copy-6-no_finalize.F90
> libgomp.oacc-fortran/deep-copy-6.f90
> libgomp.oacc-fortran/deep-copy-7.f90
> libgomp.oacc-fortran/deep-copy-8.f90
> libgomp.oacc-fortran/derived-type-1.f90
> libgomp.oacc-fortran/host_data-2.f90
> libgomp.oacc-fortran/host_data-3.f
> libgomp.oacc-fortran/host_data-4.f90
> libgomp.oacc-fortran/implicit-firstprivate-ref.f90
> libgomp.oacc-fortran/lib-14.f90
> libgomp.oacc-fortran/map-1.f90
> libgomp.oacc-fortran/nested-function-1.f90
> libgomp.oacc-fortran/nested-function-2.f90
> libgomp.oacc-fortran/nested-function-3.f90
> libgomp.oacc-fortran/no_create-3.F90
> libgomp.oacc-fortran/optional-data-copyin.f90
> libgomp.oacc-fortran/optional-data-copyout.f90
> libgomp.oacc-fortran/optional-data-enter-exit.f90
> libgomp.oacc-fortran/optional-declare.f90
> libgomp.oacc-fortran/optional-firstprivate.f90
> libgomp.oacc-fortran/optional-reduction.f90
> libgomp.oacc-fortran/optional-update-device.f90
> libgomp.oacc-fortran/optional-update-host.f90
> libgomp.oacc-fortran/parallel-dims.f90
> libgomp.oacc-fortran/parallel-loop-1.f90
> libgomp.oacc-fortran/pr81352.f90
> libgomp.oacc-fortran/pr84028.f90
> libgomp.oacc-fortran/reduction-1.f90
> libgomp.oacc-fortran/reduction-2.f90
> libgomp.oacc-fortran/reduction-3.f90
> libgomp.oacc-fortran/reduction-4.f90
> libgomp.oacc-fortran/reduction-5.f90
> libgomp.oacc-fortran/reduction-6.f90
> libgomp.oacc-fortran/reduction-7.f90
> libgomp.oacc-fortran/reduction-8.f90
> libgomp.oacc-fortran/routine-1.f90
> libgomp.oacc-fortran/routine-2.f90
> libgomp.oacc-fortran/routine-3.f90
> libgomp.oacc-fortran/routi

RE: [PING] floatformat.h: Add bfloat16 support.

2020-09-09 Thread Willgerodt, Felix via Gcc-patches

Thank you!

Felix

-Original Message-
From: Joseph Myers  
Sent: Dienstag, 8. September 2020 19:40
To: Willgerodt, Felix 
Cc: gcc-patches@gcc.gnu.org
Subject: RE: [PING] floatformat.h: Add bfloat16 support.

On Tue, 8 Sep 2020, Willgerodt, Felix via Gcc-patches wrote:

> Thanks for your review. It seems like the format issue was introduced 
> by my email client when hitting reply. Sorry for that! The original 
> patch is formatted correctly, as I used git send-email:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552079.html
> 
> Could you double-check and push the patch for me? This is the first 
> time I contribute to gcc and I therefore don't have write access.

I've now pushed this patch.

--
Joseph S. Myers
jos...@codesourcery.com
Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Gary Kershaw
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928

Re: [PATCH v2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

Hi!

On Tue, Sep 08, 2020 at 10:26:51AM +0200, Richard Biener wrote:
> Hmm, yeah - I guess that's what should be addressed first then.
> I'm quite sure that in case 'v' is not on the stack but in memory like
> in my case a SImode store is better than what we get from
> vec_insert - in fact vec_insert will likely introduce a RMW cycle
> which is prone to inserting store-data-races?

The other way around -- if it is in memory, and was stored as vector
recently, then reading back something shorter from it is prone to
SHL/LHS problems.  There is nothing special about the stack here, except
of course it is more likely to have been stored recently if on the
stack.  So it depends how often it has been stored recently which option
is best.  On newer CPUs, although they can avoid SHL/LHS flushes more
often, the penalty is relatively bigger, so memory does not often win.

I.e.: it needs to be measured.  Intuition is often wrong here.

Segher

[committed][nvptx] Fix Wformat in nvptx_assemble_decl_begin

Hi,

I'm running into this warning:
...
src/gcc/config/nvptx/nvptx.c: In function \
  ‘void nvptx_assemble_decl_begin(FILE*, const char*, const char*, \
  const_tree, long int, unsigned int, bool)’:
src/gcc/config/nvptx/nvptx.c:2229:29: warning: format ‘%d’ expects argument \
  of type ‘int’, but argument 5 has type ‘long unsigned int’ [-Wformat=]
 elt_size * BITS_PER_UNIT);
 ^
...
which I seem to have introduced in commit b9c7fe59f9f "[nvptx] Fix array
dimension in nvptx_assemble_decl_begin", but not noticed due to configuring
with --disable-build-format-warnings.

Fix this by using the appropriate format.

Rebuild cc1 on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx] Fix Wformat in nvptx_assemble_decl_begin

gcc/ChangeLog:

* config/nvptx/nvptx.c (nvptx_assemble_decl_begin): Fix Wformat
warning.

---
 gcc/config/nvptx/nvptx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 6f393dfea01..0376ad6ce9f 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -2224,7 +2224,7 @@ nvptx_assemble_decl_begin (FILE *file, const char *name, 
const char *section,
  elt_size. */
   init_frag.remaining = (size + elt_size - 1) / elt_size;
 
-  fprintf (file, "%s .align %d .u%d ",
+  fprintf (file, "%s .align %d .u" HOST_WIDE_INT_PRINT_UNSIGNED " ",
   section, align / BITS_PER_UNIT,
   elt_size * BITS_PER_UNIT);
   assemble_name (file, name);

Re: [PATCH][libatomic] Add nvptx support

On 9/9/20 3:15 PM, Tom de Vries wrote:
> On 9/9/20 2:36 PM, Tobias Burnus wrote:
>> Hi Tom,
>>
>> On 9/8/20 5:05 PM, Tobias Burnus wrote:
>>
>>> On 9/8/20 8:51 AM, Tom de Vries wrote:
     PR target/96964
     * config/nvptx/nvptx.md (define_expand "atomic_test_and_set"): New
     expansion.
     * sync-builtins.def (BUILT_IN_ATOMIC_TEST_AND_SET_1): New builtin.
>>
>> I have your patch applied on a current mainline powerpc64le-none-linux-gnu
>> + nvptx offloading build.
> 
> Thanks for trying this out.
> 
>> And I observe the following fails – which seems
>> to be new and related to your patch (but I have not confirmed it by
>> reverting your libatomic patch).
>>
> 
> Could you confirm that?
> 
> Meanwhile, I'll try to reproduce on x86_64.
> 
>> Required option for the fail: "-O2 -ftracer",
>> hence, only the "-O3 ..." testsuite builds fail.
>> (-ftracer = "Perform tail duplication to enlarge superblock size.")
>>
>>
>> during RTL pass: mach
>> asyncwait-1.f90:19: internal compiler error: in nvptx_find_par, at
>> config/nvptx/nvptx.c:3293
>> 0x10bf9f13 nvptx_find_par
>>     gcc/config/nvptx/nvptx.c:3293
>> 0x10bf9b97 nvptx_find_par
>>     gcc/config/nvptx/nvptx.c:3320
>> 0x10bf9b97 nvptx_find_par
>>     gcc/config/nvptx/nvptx.c:3320
>> ...
>>
>>
>> The ICE occurs for the second assert of:
>>     case CODE_FOR_nvptx_join:
>>   /* A loop tail.  Finish the current loop and return to
>>  parent.  */
>>   {
>>     unsigned mask = UINTVAL (XVECEXP (PATTERN (end), 0, 0));
>>
>>     gcc_assert (par->mask == mask);
>>     gcc_assert (par->join_block == NULL);
>>
>> gdb shows:
>> (gdb) p debug_bb(par->join_block )
>> (note 213 30 31 24 [bb 24] NOTE_INSN_BASIC_BLOCK)
>> (insn 31 213 204 24 (unspec_volatile:SI [
>>     (const_int 4 [0x4])
>>     ] UNSPECV_JOIN)
>> "libgomp/testsuite/libgomp.oacc-fortran/deep-copy-8.f90":24:0 237
>> {nvptx_join}
>>  (nil))
>> (jump_insn 204 31 205 24 (set (pc)
>>     (label_ref 198)) 121 {jump}
>>  (nil)
>>  -> 198)
>>
> 
> Yep, code duplication works against the matching of fork/join, it's not
> the first time we see this.
> 
> Usually the fix is to make an optimization pass conservative with
> respect to these fork/join regions, but AFAICT, ftracer already has such
> code in ignore_bb_p that tests gimple_call_internal_unique_p.
> 
> So, perhaps the ftracer pass is the trigger, but not the pass that does
> the problematic transformation? Just a guess at this point.
> 

I can reproduce it, and it's indeed the ftracer pass that does the
duplication.  So, the question is why doesn't ignore_bb_p work.

Thanks,
- Tom

> 
>>
>> That affects the testcases:
>> libgomp.oacc-fortran/asyncwait-1.f90
>> libgomp.oacc-fortran/asyncwait-2.f90
>> libgomp.oacc-fortran/asyncwait-3.f90
>> libgomp.oacc-fortran/atomic_capture-1.f90
>> libgomp.oacc-fortran/atomic_update-1.f90
>> libgomp.oacc-fortran/classtypes-1.f95
>> libgomp.oacc-fortran/collapse-1.f90
>> libgomp.oacc-fortran/collapse-2.f90
>> libgomp.oacc-fortran/collapse-3.f90
>> libgomp.oacc-fortran/collapse-4.f90
>> libgomp.oacc-fortran/collapse-5.f90
>> libgomp.oacc-fortran/collapse-6.f90
>> libgomp.oacc-fortran/collapse-7.f90
>> libgomp.oacc-fortran/collapse-8.f90
>> libgomp.oacc-fortran/combined-directives-1.f90
>> libgomp.oacc-fortran/combined-reduction.f90
>> libgomp.oacc-fortran/common-block-1.f90
>> libgomp.oacc-fortran/common-block-2.f90
>> libgomp.oacc-fortran/common-block-3.f90
>> libgomp.oacc-fortran/deep-copy-1.f90
>> libgomp.oacc-fortran/deep-copy-3.f90
>> libgomp.oacc-fortran/deep-copy-4.f90
>> libgomp.oacc-fortran/deep-copy-5.f90
>> libgomp.oacc-fortran/deep-copy-6-no_finalize.F90
>> libgomp.oacc-fortran/deep-copy-6.f90
>> libgomp.oacc-fortran/deep-copy-7.f90
>> libgomp.oacc-fortran/deep-copy-8.f90
>> libgomp.oacc-fortran/derived-type-1.f90
>> libgomp.oacc-fortran/host_data-2.f90
>> libgomp.oacc-fortran/host_data-3.f
>> libgomp.oacc-fortran/host_data-4.f90
>> libgomp.oacc-fortran/implicit-firstprivate-ref.f90
>> libgomp.oacc-fortran/lib-14.f90
>> libgomp.oacc-fortran/map-1.f90
>> libgomp.oacc-fortran/nested-function-1.f90
>> libgomp.oacc-fortran/nested-function-2.f90
>> libgomp.oacc-fortran/nested-function-3.f90
>> libgomp.oacc-fortran/no_create-3.F90
>> libgomp.oacc-fortran/optional-data-copyin.f90
>> libgomp.oacc-fortran/optional-data-copyout.f90
>> libgomp.oacc-fortran/optional-data-enter-exit.f90
>> libgomp.oacc-fortran/optional-declare.f90
>> libgomp.oacc-fortran/optional-firstprivate.f90
>> libgomp.oacc-fortran/optional-reduction.f90
>> libgomp.oacc-fortran/optional-update-device.f90
>> libgomp.oacc-fortran/optional-update-host.f90
>> libgomp.oacc-fortran/parallel-dims.f90
>> libgomp.oacc-fortran/parallel-loop-1.f90
>> libgomp.oacc-fortran/pr81352.f90
>> libgomp.oacc-fortran/pr84028.f90
>> libgomp.oacc-fortran/reduction-1.f90
>> libgomp.oacc-fortran/reduction-2.f90
>> libgomp.oacc-fortran/reduction-3.f90
>> libgomp.oacc-for

Re: [PATCH v2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

On Wed, Sep 9, 2020 at 3:49 PM Segher Boessenkool
 wrote:
>
> Hi!
>
> On Tue, Sep 08, 2020 at 10:26:51AM +0200, Richard Biener wrote:
> > Hmm, yeah - I guess that's what should be addressed first then.
> > I'm quite sure that in case 'v' is not on the stack but in memory like
> > in my case a SImode store is better than what we get from
> > vec_insert - in fact vec_insert will likely introduce a RMW cycle
> > which is prone to inserting store-data-races?
>
> The other way around -- if it is in memory, and was stored as vector
> recently, then reading back something shorter from it is prone to
> SHL/LHS problems.  There is nothing special about the stack here, except
> of course it is more likely to have been stored recently if on the
> stack.  So it depends how often it has been stored recently which option
> is best.  On newer CPUs, although they can avoid SHL/LHS flushes more
> often, the penalty is relatively bigger, so memory does not often win.
>
> I.e.: it needs to be measured.  Intuition is often wrong here.

But the current method would simply do a direct store to memory
without a preceeding read of the whole vector.

Richard.

>
>
> Segher

Re: [PATCH][libatomic] Add nvptx support

On 9/9/20 4:14 PM, Tom de Vries wrote:
> On 9/9/20 3:15 PM, Tom de Vries wrote:
>> On 9/9/20 2:36 PM, Tobias Burnus wrote:
>>> Hi Tom,
>>>
>>> On 9/8/20 5:05 PM, Tobias Burnus wrote:
>>>
 On 9/8/20 8:51 AM, Tom de Vries wrote:
>     PR target/96964
>     * config/nvptx/nvptx.md (define_expand "atomic_test_and_set"): New
>     expansion.
>     * sync-builtins.def (BUILT_IN_ATOMIC_TEST_AND_SET_1): New builtin.
>>>
>>> I have your patch applied on a current mainline powerpc64le-none-linux-gnu
>>> + nvptx offloading build.
>>
>> Thanks for trying this out.
>>
>>> And I observe the following fails – which seems
>>> to be new and related to your patch (but I have not confirmed it by
>>> reverting your libatomic patch).
>>>
>>
>> Could you confirm that?
>>
>> Meanwhile, I'll try to reproduce on x86_64.
>>
>>> Required option for the fail: "-O2 -ftracer",
>>> hence, only the "-O3 ..." testsuite builds fail.
>>> (-ftracer = "Perform tail duplication to enlarge superblock size.")
>>>
>>>
>>> during RTL pass: mach
>>> asyncwait-1.f90:19: internal compiler error: in nvptx_find_par, at
>>> config/nvptx/nvptx.c:3293
>>> 0x10bf9f13 nvptx_find_par
>>>     gcc/config/nvptx/nvptx.c:3293
>>> 0x10bf9b97 nvptx_find_par
>>>     gcc/config/nvptx/nvptx.c:3320
>>> 0x10bf9b97 nvptx_find_par
>>>     gcc/config/nvptx/nvptx.c:3320
>>> ...
>>>
>>>
>>> The ICE occurs for the second assert of:
>>>     case CODE_FOR_nvptx_join:
>>>   /* A loop tail.  Finish the current loop and return to
>>>  parent.  */
>>>   {
>>>     unsigned mask = UINTVAL (XVECEXP (PATTERN (end), 0, 0));
>>>
>>>     gcc_assert (par->mask == mask);
>>>     gcc_assert (par->join_block == NULL);
>>>
>>> gdb shows:
>>> (gdb) p debug_bb(par->join_block )
>>> (note 213 30 31 24 [bb 24] NOTE_INSN_BASIC_BLOCK)
>>> (insn 31 213 204 24 (unspec_volatile:SI [
>>>     (const_int 4 [0x4])
>>>     ] UNSPECV_JOIN)
>>> "libgomp/testsuite/libgomp.oacc-fortran/deep-copy-8.f90":24:0 237
>>> {nvptx_join}
>>>  (nil))
>>> (jump_insn 204 31 205 24 (set (pc)
>>>     (label_ref 198)) 121 {jump}
>>>  (nil)
>>>  -> 198)
>>>
>>
>> Yep, code duplication works against the matching of fork/join, it's not
>> the first time we see this.
>>
>> Usually the fix is to make an optimization pass conservative with
>> respect to these fork/join regions, but AFAICT, ftracer already has such
>> code in ignore_bb_p that tests gimple_call_internal_unique_p.
>>
>> So, perhaps the ftracer pass is the trigger, but not the pass that does
>> the problematic transformation? Just a guess at this point.
>>
> 
> I can reproduce it, and it's indeed the ftracer pass that does the
> duplication.  So, the question is why doesn't ignore_bb_p work.

Filed PR https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97000 for this.

Thanks,
- Tom

Re: [PATCH] tree-optimization/96043 - BB vectorization costing improvement

2020-09-09 Thread Michael Matz

Hello,

On Tue, 8 Sep 2020, Richard Biener wrote:

> CCing some people to double-check my graph partitioning algorithm.

I seem to not know the pre-existing data structures enough to say anything 
about this, but I noticed small things which might or might not indicate 
thinkos or incompleteness:

> +static void
> +vect_bb_partition_graph_r (bb_vec_info bb_vinfo,
> +slp_instance instance, slp_tree node,
> +hash_map 
> &stmt_to_instance,
> +hash_map 
> &instance_leader)
> +{
> +  stmt_vec_info stmt_info;
> +  unsigned i;
> +  bool all = true;
> +  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
> +{
> +  bool existed_p;
> +  slp_instance &stmt_instance
> + = stmt_to_instance.get_or_insert (stmt_info, &existed_p);
> +  if (!existed_p)
> + {
> +   all = false;
> + }
> +  else if (stmt_instance != instance)
> + {
> +   /* If we're running into a previously marked stmt make us the
> +  leader of the current ultimate leader.  This keeps the
> +  leader chain acyclic and works even when the current instance
> +  connects two previously independent graph parts.  */
> +   stmt_instance = get_ultimate_leader (stmt_instance, instance_leader);
> +   if (stmt_instance != instance)
> + instance_leader.put (stmt_instance, instance);
> + }
> +  stmt_instance = instance;

This last assignment is useless.

> +/* Partition the SLP graph into pieces that can be costed independently.  */
> +
> +static void
> +vect_bb_partition_graph (bb_vec_info bb_vinfo)
> +{
...
> +  /* Then collect entries to each independent subgraphs.  */
> +  for (unsigned i = 0; bb_vinfo->slp_instances.iterate (i, &instance); ++i)
> +{
> +  slp_instance leader = get_ultimate_leader (instance, instance_leader);
> +  if (leader == instance)
> + leader->subgraph_entries.safe_push (leader);
> +  else
> + {
> +   if (dump_enabled_p ())
> + dump_printf_loc (MSG_NOTE, vect_location,
> +  "instance %p is leader of %p\n",
> +  leader, instance);
> +   leader->subgraph_entries.safe_push (instance);
> + }

So the 'leader->subgraph_entries.safe_push (instance)' is actually done 
unconditionally (the leader is leader of itself), only the debug dump is 
conditional.


Ciao,
Michael.

Re: [PATCH][libatomic] Add nvptx support


Hi Tom,

On 9/9/20 3:15 PM, Tom de Vries wrote:

And I observe the following fails – which seems
to be new and related to your patch (but I have not confirmed it by
reverting your libatomic patch).


Could you confirm that?


It is an indenpend issue; it is newish and I can reproduce
it on x86-64-gnu-linux (not with -O2 but with -O3 -ftracer).

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter

[RFC] Aarch64: Replace nested FP min/max with conditionals for TX2

2020-09-09 Thread Anton Youdkevitch

ThunderxT2 chip has an odd property that nested scalar FP min and max are
slower than logically the same sequence of compares and branches.

Here is the patch where I'm trying to implement that transformation.
Please advise if the "combine" pass (actually after the pass itself) is the
appropriate place to do this.

I was considering the possibility to implement this in aarch64.md
(which would be much cleaner) but didn't manage to figure out how
to make fmin/fmax survive until later passes and replace them only
then.

-- 
  Thanks,
  Anton


0001-WIP-MIN-to-conditionals-1.patch
Description: Binary data

Re: [PATCH v2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

On Wed, Sep 09, 2020 at 04:28:19PM +0200, Richard Biener wrote:
> On Wed, Sep 9, 2020 at 3:49 PM Segher Boessenkool
>  wrote:
> >
> > Hi!
> >
> > On Tue, Sep 08, 2020 at 10:26:51AM +0200, Richard Biener wrote:
> > > Hmm, yeah - I guess that's what should be addressed first then.
> > > I'm quite sure that in case 'v' is not on the stack but in memory like
> > > in my case a SImode store is better than what we get from
> > > vec_insert - in fact vec_insert will likely introduce a RMW cycle
> > > which is prone to inserting store-data-races?
> >
> > The other way around -- if it is in memory, and was stored as vector
> > recently, then reading back something shorter from it is prone to
> > SHL/LHS problems.  There is nothing special about the stack here, except
> > of course it is more likely to have been stored recently if on the
> > stack.  So it depends how often it has been stored recently which option
> > is best.  On newer CPUs, although they can avoid SHL/LHS flushes more
> > often, the penalty is relatively bigger, so memory does not often win.
> >
> > I.e.: it needs to be measured.  Intuition is often wrong here.
> 
> But the current method would simply do a direct store to memory
> without a preceeding read of the whole vector.

The problem is even worse the other way: you do a short store here, but
so a full vector read later.  If the store and read are far apart, that
is fine, but if they are close (that is on the order of fifty or more
insns), there can be problems.

There often are problems over function calls (where the compiler cannot
usually *see* how something is used).


Segher

[PATCH] vec: don't select partial vectors when looping on full vectors

2020-09-09 Thread Andrea Corallo

Hi all,

this patch is meant not to generate predication in loop when the
loop is operating only on full vectors.

Ex:

#+BEGIN_SRC C
/* Vector length is 256.  */
void
f (int *restrict x, int *restrict y, unsigned int n) {
  for (unsigned int i = 0; i < n * 8; ++i)
x[i] += y[i];
}
#+END_SRC

Compiling on aarch64 with -O3 -msve-vector-bits=256 current trunk
gives:

#+BEGIN_SRC asm
f:
.LFB0:
.cfi_startproc
lsl w2, w2, 3
cbz w2, .L1
mov x3, 0
whilelo p0.s, xzr, x2
.p2align 3,,7
.L3:
ld1wz0.s, p0/z, [x0, x3, lsl 2]
ld1wz1.s, p0/z, [x1, x3, lsl 2]
add z0.s, z0.s, z1.s
st1wz0.s, p0, [x0, x3, lsl 2]
add x3, x3, 8
whilelo p0.s, x3, x2
b.any   .L3
.L1:
ret
.cfi_endproc
#+END_SRC

With the patch applied:

#+BEGIN_SRC asm
f:
.LFB0:
.cfi_startproc
lsl w3, w2, 3
cbz w3, .L1
mov x2, 0
ptrue   p0.b, vl32
.p2align 3,,7
.L3:
ld1wz0.s, p0/z, [x0, x2, lsl 2]
ld1wz1.s, p0/z, [x1, x2, lsl 2]
add z0.s, z0.s, z1.s
st1wz0.s, p0, [x0, x2, lsl 2]
add x2, x2, 8
cmp x2, x3
bne .L3
.L1:
ret
.cfi_endproc
#+END_SRC

To achieve this we check earlier if the loop needs peeling and if is
not the case we do not set LOOP_VINFO_USING_PARTIAL_VECTORS_P to true.

I moved some logic from 'determine_peel_for_niter' to
'vect_need_peeling_or_part_vects_p' so it can be used for this purpose.

Bootstrapped and regtested on aarch64-linux-gnu.

Feedback is welcome, thanks.

  Andrea

>From fdcceaa420d6c3b03cf22ab50e0f9c393e8e3932 Mon Sep 17 00:00:00 2001
From: Andrea Corallo 
Date: Fri, 28 Aug 2020 16:01:15 +0100
Subject: [PATCH] vec: don't select partial vectors when unnecessary

gcc/ChangeLog

2020-09-09  Andrea Corallo  

* tree-vect-loop.c (vect_need_peeling_or_part_vects_p): New function.
(determine_peel_for_niter): Move out some logic into
'vect_need_peeling_or_part_vects_p'.

gcc/testsuite/ChangeLog

2020-09-09  Andrea Corallo  

* gcc.target/aarch64/sve/cost_model_10.c: New test.
* gcc.target/aarch64/sve/clastb_8.c: Update test for new
vectorization strategy.
* gcc.target/aarch64/sve/cost_model_5.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_14.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_15.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_16.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_17.c: Likewise.
---
 .../gcc.target/aarch64/sve/clastb_8.c |  5 +-
 .../gcc.target/aarch64/sve/cost_model_10.c| 12 +++
 .../gcc.target/aarch64/sve/cost_model_5.c |  4 +-
 .../gcc.target/aarch64/sve/struct_vect_14.c   |  8 +-
 .../gcc.target/aarch64/sve/struct_vect_15.c   |  8 +-
 .../gcc.target/aarch64/sve/struct_vect_16.c   |  8 +-
 .../gcc.target/aarch64/sve/struct_vect_17.c   |  8 +-
 gcc/tree-vect-loop.c  | 86 +++
 8 files changed, 81 insertions(+), 58 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cost_model_10.c

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clastb_8.c 
b/gcc/testsuite/gcc.target/aarch64/sve/clastb_8.c
index 57c42082449..e61ff4ac92d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/clastb_8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/clastb_8.c
@@ -23,7 +23,4 @@ TEST_TYPE (uint64_t);
 /* { dg-final { scan-assembler {\tclastb\t(h[0-9]+), p[0-7], \1, z[0-9]+\.h\n} 
} } */
 /* { dg-final { scan-assembler {\tclastb\t(s[0-9]+), p[0-7], \1, z[0-9]+\.s\n} 
} } */
 /* { dg-final { scan-assembler {\tclastb\t(d[0-9]+), p[0-7], \1, z[0-9]+\.d\n} 
} } */
-/* { dg-final { scan-assembler {\twhilelo\tp[0-9]+\.b,} } } */
-/* { dg-final { scan-assembler {\twhilelo\tp[0-9]+\.h,} } } */
-/* { dg-final { scan-assembler {\twhilelo\tp[0-9]+\.s,} } } */
-/* { dg-final { scan-assembler {\twhilelo\tp[0-9]+\.d,} } } */
+/* { dg-final { scan-assembler {\tptrue\tp[0-9]+\.b,} 4 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cost_model_10.c 
b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_10.c
new file mode 100644
index 000..bfac09ed1c1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_10.c
@@ -0,0 +1,12 @@
+/* { dg-options "-O3 -msve-vector-bits=256" } */
+
+void
+f (int *restrict x, int *restrict y, unsigned int n)
+{
+  for (unsigned int i = 0; i < n * 8; ++i)
+x[i] += y[i];
+}
+
+/* { dg-final { scan-assembler-not {\twhilelo\t} } } */
+/* { dg-final { scan-assembler {\tptrue\tp} } } */
+/* { dg-final { scan-assembler {\tcmp\tx[0-9]+, x[0-9]+\n} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cost_model_5.c 
b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_5.c
index 250ca837324..f3a29fc38a1 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/cost_model_5.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_5.c
@@ -9,5 +9,

[PATCH] libphobos: Include to generate the CET marker for -fcf-protection

2020-09-09 Thread H.J. Lu via Gcc-patches

On Mon, Sep 7, 2020 at 7:09 PM H.J. Lu  wrote:
>
> On Mon, Sep 7, 2020 at 2:35 PM Iain Buclaw  wrote:
> >
> > Hi,
> >
> > This patch removes whatever CET support was in the switchContext routine
> > for x86 D runtime, and instead uses the ucontext fallback, which propely
> > handles shadow stack handling.
> >
> > Rather than implementing support within D runtime itself, use libc
> > getcontext/setcontext functions if CET is enabled instead.
> >
> > HJ, does this look reasonable before I commit it?  The detection has
> > been done at configure-time, rather than adding a predefined version
> > condition for CET within the compiler.
> >
> > Done regression testing on x86_64-linux-gnu/-m32/-mx32.
> >
> > Regards
> > Iain.
> >
> > ---
> > libphobos/ChangeLog:
> >
> > PR d/95680
> > * Makefile.in: Regenerate.
> > * configure: Regenerate.
> > * configure.ac (DCFG_ENABLE_CET): Substitute.
> > * libdruntime/Makefile.in: Regenerate.
> > * libdruntime/config/x86/switchcontext.S: Remove CET support code.
> > * libdruntime/core/thread.d: Import gcc.config.  Don't set version
> > AsmExternal when GNU_Enable_CET is true.
> > * libdruntime/gcc/config.d.in (GNU_Enable_CET): Define.
> > * src/Makefile.in: Regenerate.
> > * testsuite/Makefile.in: Regenerate.
>
> Looks good.  I can try it on Tiger Lake after it has been checked in.
>

Here is the patch to enable the CET marker for -fcf-protection.
I saw some D run-time failures.  I will investigate them.

-- 
H.J.
From a6e0f81ceebb0fc8791340349b43270fce3d0bf1 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Tue, 8 Sep 2020 05:54:56 -0700
Subject: [PATCH] libphobos: Include  to generate the CET marker for
 -fcf-protection

Include  to generate the CET marker for -fcf-protection to avoid

/bin/ld: ../libdruntime/.libs/libgdruntime_convenience.a(libgdruntime_convenience_la-switchcontext.o): error: missing IBT and SHSTK properties

when -z cet-report=error is passed to the linker to create libgphobos.so
and libgdruntime.so.

	PR d/95680
	* libdruntime/config/x86/switchcontext.S: Include  to
	generate the CET marker for -fcf-protection.
---
 libphobos/libdruntime/config/x86/switchcontext.S | 5 +
 1 file changed, 5 insertions(+)

diff --git a/libphobos/libdruntime/config/x86/switchcontext.S b/libphobos/libdruntime/config/x86/switchcontext.S
index 85f2e00d186..f2f8efa218e 100644
--- a/libphobos/libdruntime/config/x86/switchcontext.S
+++ b/libphobos/libdruntime/config/x86/switchcontext.S
@@ -24,6 +24,11 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 
 #include "../common/threadasm.S"
 
+/* NB: Generate the CET marker for -fcf-protection.  */
+#ifdef __CET__
+# include 
+#endif
+
 #if defined(__i386__) && !defined(__CET__)
 
 .text
-- 
2.26.2

Re: [PATCH] libphobos: Include to generate the CET marker for -fcf-protection

2020-09-09 Thread Iain Buclaw via Gcc-patches

Excerpts from H.J. Lu's message of September 9, 2020 7:08 pm:
> On Mon, Sep 7, 2020 at 7:09 PM H.J. Lu  wrote:
>>
>> On Mon, Sep 7, 2020 at 2:35 PM Iain Buclaw  wrote:
>> >
>> > Hi,
>> >
>> > This patch removes whatever CET support was in the switchContext routine
>> > for x86 D runtime, and instead uses the ucontext fallback, which propely
>> > handles shadow stack handling.
>> >
>> > Rather than implementing support within D runtime itself, use libc
>> > getcontext/setcontext functions if CET is enabled instead.
>> >
>> > HJ, does this look reasonable before I commit it?  The detection has
>> > been done at configure-time, rather than adding a predefined version
>> > condition for CET within the compiler.
>> >
>> > Done regression testing on x86_64-linux-gnu/-m32/-mx32.
>> >
>> > Regards
>> > Iain.
>> >
>> > ---
>> > libphobos/ChangeLog:
>> >
>> > PR d/95680
>> > * Makefile.in: Regenerate.
>> > * configure: Regenerate.
>> > * configure.ac (DCFG_ENABLE_CET): Substitute.
>> > * libdruntime/Makefile.in: Regenerate.
>> > * libdruntime/config/x86/switchcontext.S: Remove CET support code.
>> > * libdruntime/core/thread.d: Import gcc.config.  Don't set version
>> > AsmExternal when GNU_Enable_CET is true.
>> > * libdruntime/gcc/config.d.in (GNU_Enable_CET): Define.
>> > * src/Makefile.in: Regenerate.
>> > * testsuite/Makefile.in: Regenerate.
>>
>> Looks good.  I can try it on Tiger Lake after it has been checked in.
>>
> 
> Here is the patch to enable the CET marker for -fcf-protection.
> I saw some D run-time failures.  I will investigate them.
> 

Thanks, feel free to commit.

Iain.

[committed][nvptx, libgcc] Fix Wbuiltin-declaration-mismatch in atomic.c

Hi,

When building for target nvptx, we get this and similar warnings for libgcc:
...
src/libgcc/config/nvptx/atomic.c:39:1: warning: conflicting types for \
  built-in function ‘__sync_val_compare_and_swap_1’; expected \
  ‘unsigned char(volatile void *, unsigned char,  unsigned char)’ \
  [-Wbuiltin-declaration-mismatch]
...

Fix this by making sure in atomic.c that the pointers used are of type
'volatile void *'.

Tested by rebuilding atomic.c.

Committed to trunk.

Thanks,
- Tom

[nvptx, libgcc] Fix Wbuiltin-declaration-mismatch in atomic.c

libgcc/ChangeLog:

* config/nvptx/atomic.c (__SYNC_SUBWORD_COMPARE_AND_SWAP): Fix
Wbuiltin-declaration-mismatch.

---
 libgcc/config/nvptx/atomic.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/libgcc/config/nvptx/atomic.c b/libgcc/config/nvptx/atomic.c
index e1ea078692a..60f21f3ff7f 100644
--- a/libgcc/config/nvptx/atomic.c
+++ b/libgcc/config/nvptx/atomic.c
@@ -36,10 +36,13 @@
 #define __SYNC_SUBWORD_COMPARE_AND_SWAP(TYPE, SIZE) \
 \
 TYPE\
-__sync_val_compare_and_swap_##SIZE (TYPE *ptr, TYPE oldval, TYPE newval) \
+__sync_val_compare_and_swap_##SIZE (volatile void *vptr, TYPE oldval,   \
+   TYPE newval) \
 {   \
-  unsigned int *wordptr = (unsigned int *)((__UINTPTR_TYPE__ ) ptr & ~3UL);  \
-  int shift = ((__UINTPTR_TYPE__ ) ptr & 3UL) * 8;  \
+  volatile TYPE *ptr = vptr;\
+  volatile unsigned int *wordptr\
+= (volatile unsigned int *)((__UINTPTR_TYPE__) ptr & ~3UL); \
+  int shift = ((__UINTPTR_TYPE__) ptr & 3UL) * 8;   \
   unsigned int valmask = (1 << (SIZE * 8)) - 1;
 \
   unsigned int wordmask = ~(valmask << shift);  \
   unsigned int oldword = *wordptr;  \
@@ -64,7 +67,8 @@ __sync_val_compare_and_swap_##SIZE (TYPE *ptr, TYPE oldval, 
TYPE newval) \
 }   \
 \
 bool\
-__sync_bool_compare_and_swap_##SIZE (TYPE *ptr, TYPE oldval, TYPE newval)\
+__sync_bool_compare_and_swap_##SIZE (volatile void *ptr, TYPE oldval,   \
+TYPE newval)\
 {   \
   return __sync_val_compare_and_swap_##SIZE (ptr, oldval, newval) == oldval; \
 }

[PATCH] x32: Update gcc.target/i386/builtin_thread_pointer.c

2020-09-09 Thread H.J. Lu via Gcc-patches

On Wed, Sep 9, 2020 at 1:17 AM Hongtao Liu  wrote:
>
> On Wed, Sep 9, 2020 at 2:35 PM Jakub Jelinek  wrote:
> >
> > On Wed, Sep 09, 2020 at 10:30:46AM +0800, Hongtao Liu wrote:
> > > From 400418fadce46e7db7bd37be45ef5ff5beb08d19 Mon Sep 17 00:00:00 2001
> > > From: liuhongt 
> > > Date: Tue, 8 Sep 2020 15:44:58 +0800
> > > Subject: [PATCH] Implement __builtin_thread_pointer for x86 TLS.
> > >
> > > gcc/ChangeLog:
> > >   PR target/96955
> > >   * config/i386/i386.md (get_thread_pointer): New
> > >   expander.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.target/i386/builtin_thread_pointer.c: New test.
> > > ---
> > >  gcc/config/i386/i386.md   | 10 +++
> > >  .../gcc.target/i386/builtin_thread_pointer.c  | 28 +++
> > >  2 files changed, 38 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/builtin_thread_pointer.c
> > >
> > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > > index 446793b78db..2f6eb0a7b98 100644
> > > --- a/gcc/config/i386/i386.md
> > > +++ b/gcc/config/i386/i386.md
> > > @@ -15433,6 +15433,16 @@ (define_insn_and_split 
> > > "*tls_local_dynamic_32_once"
> > >(clobber (reg:CC FLAGS_REG))])])
> > >
> > >  ;; Load and add the thread base pointer from %:0.
> > > +(define_expand "get_thread_pointer"
> > > +  [(set (match_operand:PTR 0 "register_operand")
> > > + (unspec:PTR [(const_int 0)] UNSPEC_TP))]
> > > +  ""
> > > +{
> > > +  /* targetm is not existed in the scope of condition.  */
> >
> > Reword as "targetm is not visible in the scope of the condition."
> > In fact, even if it was, it wouldn't help, because 
> > expand_builtin_thread_pointer
> > assumes that if the expander exists, then it will work and emit some code
> > and emits the error only if the expander doesn't exist.
> >
> > Ok for trunk with that change, thanks.
> >
>
> Thanks for the review.
>

I am checking in this for x32.

-- 
H.J.
From 64a1b56db4793833ff8d97c7fe1b200689250764 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Wed, 9 Sep 2020 10:29:47 -0700
Subject: [PATCH] x32: Update gcc.target/i386/builtin_thread_pointer.c

Update gcc.target/i386/builtin_thread_pointer.c for x32.  For

int
foo3 (int i)
{
  int* p = (int*) __builtin_thread_pointer ();
  return p[i];
}

we can't generate:

	movl	%fs:0(,%edi,4), %eax
	ret

for x32 since the address of %fs:0(,%edi,4) is %fs + zero-extended to 64
bits of 0(,%edi,4).  Instead we generate

	movl	%fs:0, %eax
	movl	(%eax,%edi,4), %eax

	PR target/96955
	* gcc.target/i386/builtin_thread_pointer.c: Update scan-assembler
	for x32
---
 gcc/testsuite/gcc.target/i386/builtin_thread_pointer.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/builtin_thread_pointer.c b/gcc/testsuite/gcc.target/i386/builtin_thread_pointer.c
index dce31488117..16a7ca49b99 100644
--- a/gcc/testsuite/gcc.target/i386/builtin_thread_pointer.c
+++ b/gcc/testsuite/gcc.target/i386/builtin_thread_pointer.c
@@ -25,4 +25,6 @@ foo3 (int i)
   return p[i];
 }
 
-/* { dg-final { scan-assembler "movl\[ \t\]*%\[fg\]s:0\\(,%\[a-z0-9\]*,4\\), %eax" } }  */
+/* { dg-final { scan-assembler "movl\[ \t\]*%\[fg\]s:0\\(,%\[a-z0-9\]*,4\\), %eax" { target { ! x32 } } } }  */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%fs:0\\(,%\[a-z0-9\]*,4\\), %eax" { target x32 } } }  */
+/* { dg-final { scan-assembler "movl\[ \t\]*\\(%eax,%edi,4\\), %eax" { target x32 } } }  */
-- 
2.26.2

Re: [PING^2] [PATCH V2 0/4] Unify C and C++ handling of loops and switches


Ping again on this patch series:
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551927.html

These patches just missed making it into GCC 10 last year -- although 
there seemed to be agreement in principle, they needed a bit more work 
to resolve test regressions.  Now that we are heading into fall again, I 
am worried that they may miss GCC 11 as well if they need further 
re-working but I don't get feedback until very late in the release 
cycle, or any feedback at all.  :-(  I also have a set of OpenACC 
patches for identifying loops in kernels regions that depend on these; 
I'll be posting those shortly and I hope to get those into GCC 11 as well.


-Sandra

On 8/13/20 10:34 AM, Sandra Loosemore wrote:

This is a revised version of the patch set originally posted
last November:

https://gcc.gnu.org/pipermail/gcc-patches/2019-November/534142.html

In addition to generally updating and rebasing the patches to reflect
other changes on mainline in the meantime, for this version I have
switched to using the C lowering strategy (directly to goto form)
rather than the C++ one (to LOOP_EXPR) because of regressions in the C
optimization tests.  Besides the ones previously noted in the original
patch submission, there were a bunch of new ones since November.  Some
of them were trivial to fix (e.g., flipping branch probabilities to
reflect the different sense of the loop exit condition in the
C++-style output), but I wasn't making much progress on others and
eventually decided to pursue the "plan B" of using the C-style output
everywhere, as discussed here:

https://gcc.gnu.org/pipermail/gcc-patches/2019-December/536536.html

The only regression I ran into with this was a bootstrap failure
building the Fortran front end from a new -Wmaybe-uninitialized error.
This might be a false positive but part 3 of the new series works
around it by adding an assertion to give g++ a hint.  Unfortunately I
had no luck in trying to reduce this to a standalone test case, but I
did observe that the failure went away when I compiled that file with
debugging enabled.  :-S  I could file a PR to look into this further if
the workaround is good enough for now.

-Sandra


Sandra Loosemore (4):
   Move loop and switch tree data structures from cp/ to c-family/.
   Use C-style loop lowering instead of C++-style.
   Work around bootstrap failure in Fortran front end.
   Change C front end to emit structured loop and switch tree nodes.

  gcc/c-family/c-common.c |  24 ++
  gcc/c-family/c-common.def   |  24 ++
  gcc/c-family/c-common.h |  53 +++-
  gcc/c-family/c-dump.c   |  38 +++
  gcc/c-family/c-gimplify.c   | 422 


  gcc/c-family/c-pretty-print.c   |  92 ++-
  gcc/c/c-decl.c  |  18 +-
  gcc/c/c-lang.h  |   3 +-
  gcc/c/c-objc-common.h   |   2 +
  gcc/c/c-parser.c    | 125 +-
  gcc/c/c-tree.h  |  21 +-
  gcc/c/c-typeck.c    | 227 ++---
  gcc/cp/cp-gimplify.c    | 469 
+++-

  gcc/cp/cp-objcp-common.c    |  13 +-
  gcc/cp/cp-tree.def  |  23 --
  gcc/cp/cp-tree.h    |  40 ---
  gcc/cp/cxx-pretty-print.c   |  78 --
  gcc/cp/dump.c   |  31 ---
  gcc/doc/generic.texi    |  56 +++--
  gcc/fortran/interface.c |   4 +
  gcc/objc/ChangeLog  |   5 +
  gcc/objc/objc-act.c |   6 +-
  gcc/testsuite/gcc.dg/gomp/block-7.c |  12 +-
  23 files changed, 938 insertions(+), 848 deletions(-)

[pushed] testsuite: Move auto-96647.C to c++1y/.

This test uses a C++14 feature so fails with -std=c++11.  Therefore
I've moved it to cpp1y/ and used target c++14.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/auto-96647.C: Moved to...
* g++.dg/cpp1y/auto-96647.C: ...here.  Use target c++14.
---
 gcc/testsuite/g++.dg/{cpp0x => cpp1y}/auto-96647.C | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
 rename gcc/testsuite/g++.dg/{cpp0x => cpp1y}/auto-96647.C (78%)

diff --git a/gcc/testsuite/g++.dg/cpp0x/auto-96647.C 
b/gcc/testsuite/g++.dg/cpp1y/auto-96647.C
similarity index 78%
rename from gcc/testsuite/g++.dg/cpp0x/auto-96647.C
rename to gcc/testsuite/g++.dg/cpp1y/auto-96647.C
index 314b2a16ac2..8cbe155415c 100644
--- a/gcc/testsuite/g++.dg/cpp0x/auto-96647.C
+++ b/gcc/testsuite/g++.dg/cpp1y/auto-96647.C
@@ -1,5 +1,5 @@
 // PR c++/96647
-// { dg-do compile { target c++11 } }
+// { dg-do compile { target c++14 } }
 
 template
 struct Base {

base-commit: bf69edf8ce47ca618eff30df2308279a40b22096
-- 
2.26.2

[PATCH 1/2] [OpenACC] Kernels loops annotation: C and C++.

This patch detects loops in kernels regions that are candidates for
parallelization, and adds "#pragma acc loop auto" annotations to them.
This annotation is controlled by the -fopenacc-kernels-annotate-loops
option, which is enabled by default.  -Wopenacc-kernels-annotate-loops
can be used to produce diagnostics about loops that cannot be
annotated.

2020-09-08  Sandra Loosemore  

gcc/c-family/
* c-common.h (c_oacc_annotate_loops_in_kernels_regions): Declare.
* c-omp.c: Include tree-iterator.h.
(enum annotation_state): New.
(struct annotation_info): New.
(do_not_annotate_loop): New.
(do_not_annotate_loop_nest): New.
(annotation_error): New.
(c_finish_omp_for_internal): New.
(c_finish_omp_for): Use c_finish_omp_for_internal.
(is_local_var): New.
(end_test_ok_for_annotation_r): New.
(end_test_ok_for_annotation): New.
(lang_specific_unwrap_initializer): New.
(annotate_for_loop): New.
(annotate_and_check_for_loop): New.
(annotate_loops_in_kernels_regions): New.
(c_oacc_annotate_loops_in_kernels_regions): New.
* c.opt (Wopenacc-kernels-annotate-loops): New.
(fopenacc-kernels-annotate-loops): New.

gcc/c/
* c-decl.c (c_unwrap_for_init): New.
(finish_function): Call c_oacc_annotate_loops_in_kernels_regions.
* c-parser.c (c_parser_oacc_loop): Set OACC_LOOP_COMBINED.

gcc/cp/
* decl.c (cp_unwrap_for_init): New.
(finish_function): Call c_oacc_annotate_loops_in_kernels_regions.
* parser.c (cp_parser_oacc_loop): Set OACC_LOOP_COMBINED.
* semantics.c (handle_omp_array_sections_1): Call STRIP_NOPS
on length and bound.
(handle_omp_array_sections): Likewise.

gcc/
* doc/invoke.texi (Option Summary): Add entries for
-Wopenacc-kernels-annotate-loops and
-fno-openacc-kernels-annotate-loops.
(Warning Options): Document -Wopenacc-kernels-annotate-loops.
(Optimization Options): Document
-fno-openacc-kernels-annotate-loops.
* tree.h (OACC_LOOP_COMBINED): New.

gcc/testsuite/
* c-c++-common/goacc/classify-kernels-unparallelized.c: Add
-fno-openacc-kernels-annotate-loops option.
* c-c++-common/goacc/classify-kernels.c: Likewise.
* c-c++-common/goacc/combined-directives.c: Likewise.
* c-c++-common/goacc/kernels-counter-var-redundant-load.c:
Likewise.
* c-c++-common/goacc/kernels-counter-vars-function-scope.c:
Likewise.
* c-c++-common/goacc/kernels-double-reduction-n.c: Likewise.
* c-c++-common/goacc/kernels-double-reduction.c: Likewise.
* c-c++-common/goacc/kernels-loop-2.c: Likewise.
* c-c++-common/goacc/kernels-loop-3.c: Likewise.
* c-c++-common/goacc/kernels-loop-data-2.c: Likewise.
* c-c++-common/goacc/kernels-loop-data-enter-exit-2.c: Likewise.
* c-c++-common/goacc/kernels-loop-data-enter-exit.c: Likewise.
* c-c++-common/goacc/kernels-loop-data-update.c: Likewise.
* c-c++-common/goacc/kernels-loop-data.c: Likewise.
* c-c++-common/goacc/kernels-loop-g.c: Likewise.
* c-c++-common/goacc/kernels-loop-mod-not-zero.c: Likewise.
* c-c++-common/goacc/kernels-loop-n.c: Likewise.
* c-c++-common/goacc/kernels-loop-nest.c: Likewise.
* c-c++-common/goacc/kernels-loop.c: Likewise.
* c-c++-common/goacc/kernels-one-counter-var.c: Likewise.
* c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c:
Likewise.
* c-c++-common/goacc/kernels-reduction.c: Likewise.
* c-c++-common/goacc/kernels-loop-annotation-1.c: New.
* c-c++-common/goacc/kernels-loop-annotation-2.c: New.
* c-c++-common/goacc/kernels-loop-annotation-3.c: New.
* c-c++-common/goacc/kernels-loop-annotation-4.c: New.
* c-c++-common/goacc/kernels-loop-annotation-5.c: New.
* c-c++-common/goacc/kernels-loop-annotation-6.c: New.
* c-c++-common/goacc/kernels-loop-annotation-7.c: New.
* c-c++-common/goacc/kernels-loop-annotation-8.c: New.
* c-c++-common/goacc/kernels-loop-annotation-9.c: New.
* c-c++-common/goacc/kernels-loop-annotation-10.c: New.
* c-c++-common/goacc/kernels-loop-annotation-11.c: New.
* c-c++-common/goacc/kernels-loop-annotation-12.c: New.
* c-c++-common/goacc/kernels-loop-annotation-13.c: New.
* c-c++-common/goacc/kernels-loop-annotation-14.c: New.
* c-c++-common/goacc/kernels-loop-annotation-15.c: New.
* c-c++-common/goacc/kernels-loop-annotation-16.c: New.
* c-c++-common/goacc/kernels-loop-annotation-17.c: New.
* c-c++-common/goacc/kernels-loop-annotation-18.c: New.
* c-c++-common/goacc/kernels-loop-annotation-19.c: New.
* c-c++-common/goacc/kernels-loop-annotation-20.c: New.
* c-c++

[PATCH 0/2] [OpenACC] Kernels loop annotation

This set of patches implements C/C++ and Fortran front end support for
adding "acc loop auto" annotations to loop nests in OpenACC kernels
regions.  For background on this, refer to Thomas Schwinge's talk from
last year's cauldron, at

https://gcc.gnu.org/wiki/cauldron2019talks?action=AttachFile&do=view&target=OpenACC+kernels-cauldron2019.pdf

In particular, pages 20-24 describe this part of the work.  We're
trying to identify loops that might be parallelizable and convert them
to ACC_LOOP tree structures for further analysis, instead of lowering
them to goto form early in compilation, as we do with ordinary
for/while/do loops in C/C++ and DO loops in Fortran.

The C/C++ patches depend on my earlier not-yet-reviewed patch series
to unify the loop tree representations in the two front ends, which I
most recently reposted here:

https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551927.html

These patches have all been previously committed to the OG10 branch to
unblock other work on subsequent processing of the loops with auto
annotations, but without proper review.  Here I've mashed up the
followup bug fixes and incremental improvements I committed to the
branch together with the original patches to simplify review.  If
anyone cares, the corresponding commits on the OG10 branch were:

c96409c1f190e29fd9809890fb16d72556f3b7e6
fdbad20a57e03e05b608f19db41a454bc0cd1c47
6d670e648e76fe44589a42ee458098ff84d24af1
cb581bdb6689d74c1849b7e6bc139c6f122fdcc2
16d59cea8362c1ec731aa5b4db42a2817f036f23
7b436e90a4e03fdce5b0c6a8c452d3f23f1e136b
c2789b61cf29397295f39a43f5d1605ab8a32d87
1c9af55d7ff76e2e6b633af33e6e6991a0ba4c48
20f37fd2f9c8c52fff380982d6fc5eb2d88b3dd9
df5f2065bad30dc6aff9653237157c33fd4161cd

Sandra Loosemore (2):
  Kernels loops annotation: C and C++.
  Kernels loops annotation: Fortran.

 gcc/c-family/c-common.h|   1 +
 gcc/c-family/c-omp.c   | 916 +++--
 gcc/c-family/c.opt |   8 +
 gcc/c/c-decl.c |  28 +
 gcc/c/c-parser.c   |   3 +
 gcc/cp/decl.c  |  44 +
 gcc/cp/parser.c|   3 +
 gcc/cp/semantics.c |   9 +
 gcc/doc/invoke.texi|  34 +-
 gcc/fortran/gfortran.h |   1 +
 gcc/fortran/lang.opt   |   8 +
 gcc/fortran/openmp.c   | 415 ++
 gcc/fortran/parse.c|   9 +
 gcc/fortran/trans-openmp.c |  30 +-
 .../goacc/classify-kernels-unparallelized.c|   1 +
 .../c-c++-common/goacc/classify-kernels.c  |   1 +
 .../c-c++-common/goacc/combined-directives.c   |   2 +-
 .../goacc/kernels-counter-var-redundant-load.c |   1 +
 .../goacc/kernels-counter-vars-function-scope.c|   1 +
 .../goacc/kernels-double-reduction-n.c |   1 +
 .../c-c++-common/goacc/kernels-double-reduction.c  |   1 +
 gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c  |   1 +
 gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c  |   1 +
 .../c-c++-common/goacc/kernels-loop-annotation-1.c |  26 +
 .../goacc/kernels-loop-annotation-10.c |  32 +
 .../goacc/kernels-loop-annotation-11.c |  27 +
 .../goacc/kernels-loop-annotation-12.c |  28 +
 .../goacc/kernels-loop-annotation-13.c |  27 +
 .../goacc/kernels-loop-annotation-14.c |  22 +
 .../goacc/kernels-loop-annotation-15.c |  22 +
 .../goacc/kernels-loop-annotation-16.c |  26 +
 .../goacc/kernels-loop-annotation-17.c |  26 +
 .../goacc/kernels-loop-annotation-18.c |  18 +
 .../goacc/kernels-loop-annotation-19.c |  19 +
 .../c-c++-common/goacc/kernels-loop-annotation-2.c |  21 +
 .../goacc/kernels-loop-annotation-20.c |  23 +
 .../goacc/kernels-loop-annotation-21.c |  42 +
 .../goacc/kernels-loop-annotation-22.c |  41 +
 .../c-c++-common/goacc/kernels-loop-annotation-3.c |  24 +
 .../c-c++-common/goacc/kernels-loop-annotation-4.c |  34 +
 .../c-c++-common/goacc/kernels-loop-annotation-5.c |  27 +
 .../c-c++-common/goacc/kernels-loop-annotation-6.c |  27 +
 .../c-c++-common/goacc/kernels-loop-annotation-7.c |  26 +
 .../c-c++-common/goacc/kernels-loop-annotation-8.c |  27 +
 .../c-c++-common/goacc/kernels-loop-annotation-9.c |  26 +
 .../c-c++-common/goacc/kernels-loop-data-2.c   |   1 +
 .../goacc/kernels-loop-data-enter-exit-2.c |   1 +
 .../goacc/kernels-loop-data-enter-exit.c   |   1 +
 .../c-c++-common/goacc/kernels-loop-data-update.c  |   1 +
 .../c-c++-common/goacc/kernels-loop-data.c |   1 +
 gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c  |   1 +
 .../c-c++-common/goacc/kernels-loop-mod-not-zero.c |   1 +
 gcc/testsuite/c-c++-common/g

[PATCH 2/2] [OpenACC] Kernels loops annotation: Fortran.

This patch implements the Fortran support for adding "#pragma acc loop
auto" annotations to loops in OpenACC kernels regions.  It implements
the same -fopenacc-kernels-annotate-loops and
-Wopenacc-kernels-annotate-loops options that were previously added
(and documented) for the C/C++ front ends.

2020-09-08  Sandra Loosemore  
Gergö Barany 

gcc/fortran/

* gfortran.h (gfc_oacc_annotate_loops_in_kernels_regions):
Declare.
* lang.opt (Wopenacc-kernels-annotate-loops): New.
(fopenacc-kernels-annotate-loops): New.
* openmp.c: Include options.h.
(enum annotation_state): New.
(enum annotation_result): New.
(check_code_for_invalid_calls): New.
(check_expr_for_invalid_calls): New.
(check_for_invalid_calls): New.
(annotate_do_loop): New.
(annotate_do_loops_in_kernels): New.
(compute_goto_targets): New.
(gfc_oacc_annotate_loops_in_kernels_regions): New.
* parse.c (gfc_parse_file): Handle
-fopenacc-kernels-annotate-loops.
* trans-openmp.c (gfc_trans_omp_do): Add combined parameter.
Use it to set OACC_LOOP_COMBINED.  Adjust call sites.

gcc/testsuite/
* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Add
-fno-openacc-kernels-annotate-loops option.
* gfortran.dg/goacc/classify-kernels.f95: Likewise.
* gfortran.dg/goacc/combined-directives.f90: Adjust patterns.
* gfortran.dg/goacc/common-block-3.f90: Add
-fno-openacc-kernels-annotate-loops option.
* gfortran.dg/goacc/kernels-loop-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95:
Likewise.
* gfortran.dg/goacc/kernels-loop-data-enter-exit.f95:
Likewise.
* gfortran.dg/goacc/kernels-loop-data-update.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-n.f95: Likewise.
* gfortran.dg/goacc/kernels-loop.f95: Likewise.
* gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95:
Likewise.
* gfortran.dg/goacc/private-explicit-kernels-1.f95: Adjust
patterns.
* gfortran.dg/goacc/private-predetermined-kernels-1.f95:
Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-1.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-2.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-3.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-4.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-5.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-6.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-7.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-8.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-9.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-10.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-11.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-12.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-13.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-14.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-15.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-16.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-18.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-19.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-20.f95: New.
---
 gcc/fortran/gfortran.h |   1 +
 gcc/fortran/lang.opt   |   8 +
 gcc/fortran/openmp.c   | 415 +
 gcc/fortran/parse.c|   9 +
 gcc/fortran/trans-openmp.c |  30 +-
 .../goacc/classify-kernels-unparallelized.f95  |   1 +
 .../gfortran.dg/goacc/classify-kernels.f95 |   1 +
 .../gfortran.dg/goacc/combined-directives.f90  |  19 +-
 gcc/testsuite/gfortran.dg/goacc/common-block-3.f90 |   1 +
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95 |   1 +
 .../goacc/kernels-loop-annotation-1.f95|  33 ++
 .../goacc/kernels-loop-annotation-10.f95   |  32 ++
 .../goacc/kernels-loop-annotation-11.f95   |  34 ++
 .../goacc/kernels-loop-annotation-12.f95   |  39 ++
 .../goacc/kernels-loop-annotation-13.f95   |  38 ++
 .../goacc/kernels-loop-annotation-14.f95   |  35 ++
 .../goacc/kernels-loop-annotation-15.f95   |  35 ++
 .../goacc/kernels-loop-annotation-16.f95   |  34 ++
 .../goacc/kernels-loop-annotation-18.f95   |  28 ++
 .../goacc/kernels-loop-annotation-19.f95   |  29 ++
 .../goacc/kernels-loop-annotation-2.f95|  32 ++
 .../goacc/kernels-loop-annotation-20.f95   |  26 ++
 .../goacc/kernels-loop-annotation-3.f95

[PATCH v1] [include] Add codes for DWARF v5 .dwp sections to dwarf2.h

2020-09-09 Thread Caroline Tice via Gcc-patches

For DWARF v5 Dwarf Package Files (.dwp files), the section identifier
encodings have changed. This patch updates dwarf2.h to contain the new
encodings.  (see http://dwarfstd.org/doc/DWARF5.pdf, section 7.3.5).

This patch has already been committed in binutils, but it needs to go into GCC
as well to avoid the binutils patch being overwritten/lost.

I tested this by running the regression testsuite; there were no regressions.

Is this ok to commit?

-- Caroline Tice
cmt...@google.com

include/ChangeLog

2020-09-09  Caroline Tice  

* dwarf2.h (enum dwarf_sect_v5): A new enum section for the
sections in a DWARF 5 DWP file (DWP version 5).


v1-0001-Add-codes-for-DWARF-v5-.dwp-sections-to-dwarf2.h.gcc.patch
Description: Binary data

[pushed] testsuite: Use C++14 in g++.dg/warn/Wnonnull6.C.

This test uses C++14 features so is failing with -std=c++11.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wnonnull6.C: Use target c++14.
---
 gcc/testsuite/g++.dg/warn/Wnonnull6.C | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/g++.dg/warn/Wnonnull6.C 
b/gcc/testsuite/g++.dg/warn/Wnonnull6.C
index dae6dd2d912..7eff7f6273b 100644
--- a/gcc/testsuite/g++.dg/warn/Wnonnull6.C
+++ b/gcc/testsuite/g++.dg/warn/Wnonnull6.C
@@ -1,7 +1,7 @@
 /* PR c++/95984 - Internal compiler error: Error reporting routines re-entered
-   in -Wnonnull on a variadic lamnda
+   in -Wnonnull on a variadic lambda
PR c++/missing -Wnonnull passing nullptr to a nonnull variadic lambda
-   { dg-do compile { target c++11 } }
+   { dg-do compile { target c++14 } }
{ dg-options "-Wall" } */
 
 typedef int F (int);

base-commit: 919373a6bfff415db7676c9f92a356ddfc501dfe
-- 
2.26.2

[PATCH] [PATCH] PR rtl-optimization/96791 Check precision of partial modes

2020-09-09 Thread Aaron Sawdey via Gcc-patches

Now that the documentation for partial modes says they have a known
number of bits of precision, would it make sense for extract_low_bits to
check this before attempting to extract the bits?

This would solve the problem we have been having with POImode and
extract_low_bits -- DSE tries to use it to extract part of a POImode
register used in a previous store. We do not want to supply any patterns
to make POImode (or OImode) used like a regular integer mode.

This patch adds such a check, and sets the precision of POImode to one
bit, which resolves the problems of PR/96791 for ppc64 target.

Bootstrap passes on ppc64le and x86_64.

Thanks,
   Aaron

gcc/ChangeLog:

* config/rs6000/rs6000-modes.def (POImode): Change precision.
* expmed.c (extract_low_bits): Check precision.
---
 gcc/config/rs6000/rs6000-modes.def | 2 +-
 gcc/expmed.c   | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000-modes.def 
b/gcc/config/rs6000/rs6000-modes.def
index ddb218b3fba..aa7d60dd835 100644
--- a/gcc/config/rs6000/rs6000-modes.def
+++ b/gcc/config/rs6000/rs6000-modes.def
@@ -90,5 +90,5 @@ INT_MODE (OI, 32);
 INT_MODE (XI, 64);
 
 /* Modes used by __vector_pair and __vector_quad.  */
-PARTIAL_INT_MODE (OI, 256, POI);   /* __vector_pair.  */
+PARTIAL_INT_MODE (OI, 1, POI); /* __vector_pair.  */
 PARTIAL_INT_MODE (XI, 512, PXI);   /* __vector_quad.  */
diff --git a/gcc/expmed.c b/gcc/expmed.c
index d34f0fb0b54..23ca181afa6 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -2396,6 +2396,9 @@ extract_low_bits (machine_mode mode, machine_mode 
src_mode, rtx src)
   if (GET_MODE_CLASS (mode) == MODE_CC || GET_MODE_CLASS (src_mode) == MODE_CC)
 return NULL_RTX;
 
+  if (known_lt (GET_MODE_PRECISION (src_mode), GET_MODE_BITSIZE (mode)))
+return NULL_RTX;
+
   if (known_eq (GET_MODE_BITSIZE (mode), GET_MODE_BITSIZE (src_mode))
   && targetm.modes_tieable_p (mode, src_mode))
 {
-- 
2.17.1

Re: [PATCH v2] c++: Fix ICE in reshape_init with init-list [PR95164]

On Mon, Sep 07, 2020 at 06:23:01PM -0400, Jason Merrill via Gcc-patches wrote:
> On 9/4/20 5:39 PM, Marek Polacek wrote:
> > This patch fixes a long-standing bug in reshape_init_r.  Since r209314
> > we implement DR 1467 which handles list-initialization with a single
> > initializer of the same type as the target.  In this test this causes
> > a crash in reshape_init_r when we're processing a constructor that has
> > undergone the DR 1467 transformation.
> > 
> > Take e.g. the
> > 
> >foo({{1, {H{k);
> > 
> > line in the attached test.  {H{k}} initializes the field b of H in I.
> > H{k} is a functional cast, so has TREE_HAS_CONSTRUCTOR set, so is
> > COMPOUND_LITERAL_P.  We perform the DR 1467 transformation and turn
> > {H{k}} into H{k}.  Then we attempt to reshape H{k} again and since
> > first_initializer_p is null and it's COMPOUND_LITERAL_P, we go here:
> > 
> > else if (COMPOUND_LITERAL_P (stripped_init))
> >   gcc_assert (!BRACE_ENCLOSED_INITIALIZER_P (stripped_init));
> 
> It looks to me like the bug is here:
> 
> >   /* [dcl.init.aggr]
> > All implicit type conversions (clause _conv_) are considered when
> > initializing the aggregate member with an initializer from an
> > initializer-list.  If the initializer can initialize a member,
> > the member is initialized.  Otherwise, if the member is itself a
> > non-empty subaggregate, brace elision is assumed and the
> > initializer is considered for the initialization of the first
> > member of the subaggregate.  */
> >   if (TREE_CODE (init) != CONSTRUCTOR
> >   /* But don't try this for the first initializer, since that would
> > be  looking through the
> > outermost braces; A a2 = { a1 }; is not a
> > valid aggregate initialization.  */
> >   && !first_initializer_p
> >   && (same_type_ignoring_top_level_qualifiers_p (type, TREE_TYPE (init))
> >   || can_convert_arg (type, TREE_TYPE (init), init, LOOKUP_NORMAL,
> >   complain)))
> > {
> >   d->cur++;
> >   return init;
> > }
> 
> We ought to handle H{k} here, treat it as the initializer for the member,
> and not get as far as the code you quote above.

Like this?  When we have a COMPOUND_LITERAL_P, then I think we don't need
to check cxx11, or CLASS_TYPE, or d.end - d.cur, because that's inherent.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10?

-- >8 --
This patch fixes a long-standing bug in reshape_init_r.  Since r209314
we implement DR 1467 which handles list-initialization with a single
initializer of the same type as the target.  In this test this causes
a crash in reshape_init_r when we're processing a constructor that has
undergone the DR 1467 transformation.

Take e.g. the

  foo({{1, {H{k);

line in the attached test.  {H{k}} initializes the field b of H in I.
H{k} is a functional cast, so has TREE_HAS_CONSTRUCTOR set, so is
COMPOUND_LITERAL_P.  We perform the DR 1467 transformation and turn
{H{k}} into H{k}.  Then we attempt to reshape H{k} again and since
first_initializer_p is null and it's COMPOUND_LITERAL_P, we go here:

   else if (COMPOUND_LITERAL_P (stripped_init))
 gcc_assert (!BRACE_ENCLOSED_INITIALIZER_P (stripped_init));

then complain about the missing braces, go to reshape_init_class and ICE
on
   gcc_checking_assert (d->cur->index
== get_class_binding (type, id));

because due to the missing { } we're looking for 'b' in H, but that's
not found.

So we have to be prepared to handle an initializer whose outer braces
have been removed due to DR 1467.

gcc/cp/ChangeLog:

PR c++/95164
* decl.c (reshape_init_r): When we've found a missing set of braces
as a result of the DR 1467 transformation, don't reshape again.

gcc/testsuite/ChangeLog:

PR c++/95164
* g++.dg/cpp0x/initlist123.C: New test.
---
 gcc/cp/decl.c|  8 -
 gcc/testsuite/g++.dg/cpp0x/initlist123.C | 39 
 2 files changed, 46 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/initlist123.C

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 31d68745844..6565cd7199b 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -6466,7 +6466,13 @@ reshape_init_r (tree type, reshape_iter *d, tree 
first_initializer_p,
  non-empty subaggregate, brace elision is assumed and the
  initializer is considered for the initialization of the first
  member of the subaggregate.  */
-  if (TREE_CODE (init) != CONSTRUCTOR
+  if ((TREE_CODE (init) != CONSTRUCTOR
+   /* If we previously elided the braces around the single element
+ of an initializer list when initializing an object of the same
+ class type, don't report missing braces or reshape again.  In
+ this case the braces had been enclosing a compound literal or
+ functional cast with aggregate, e.g. {

c++: omp reduction cleanups

2020-09-09 Thread Nathan Sidwell


omp reductions are modeled as nested functions, which is a thing C++
doesn't have.  Leading to much confusion until I figured out what was
happening.  Not helped by some duplicate code and inconsistencies in
the dependent and non-dependent paths.  This patch removes the parser
duplication and fixes up some bookkeeping.  Added some asserts and
comments too.

gcc/cp/
* parser.c (cp_parser_omp_declare_reduction): Refactor to avoid
code duplication.  Update DECL_TI_TEMPLATE's context.
* pt.c (tsubst_expr): For OMP reduction function, set context to
global_namespace before pushing.
(tsubst_omp_udr): Assert current_function_decl, add comment about
decl context.

pushing to trunk

nathan
--
Nathan Sidwell
diff --git c/gcc/cp/parser.c w/gcc/cp/parser.c
index 9849e59d5aa..0da383937c2 100644
--- c/gcc/cp/parser.c
+++ w/gcc/cp/parser.c
@@ -42616,16 +42616,9 @@ cp_parser_omp_declare_reduction (cp_parser *parser, cp_token *pragma_tok,
 	  cp_parser_push_lexer_for_tokens (parser, cp);
 	  parser->lexer->in_pragma = true;
 	}
-  if (!cp_parser_omp_declare_reduction_exprs (fndecl, parser))
-	{
-	  if (!block_scope)
-	finish_function (/*inline_p=*/false);
-	  else
-	DECL_CONTEXT (fndecl) = current_function_decl;
-	  if (cp)
-	cp_parser_pop_lexer (parser);
-	  goto fail;
-	}
+
+  bool ok = cp_parser_omp_declare_reduction_exprs (fndecl, parser);
+
   if (cp)
 	cp_parser_pop_lexer (parser);
   if (!block_scope)
@@ -42633,6 +42626,14 @@ cp_parser_omp_declare_reduction (cp_parser *parser, cp_token *pragma_tok,
   else
 	{
 	  DECL_CONTEXT (fndecl) = current_function_decl;
+	  if (DECL_TEMPLATE_INFO (fndecl))
+	DECL_CONTEXT (DECL_TI_TEMPLATE (fndecl)) = current_function_decl;
+	}
+  if (!ok)
+	goto fail;
+
+  if (block_scope)
+	{
 	  block = finish_omp_structured_block (block);
 	  if (TREE_CODE (block) == BIND_EXPR)
 	DECL_SAVED_TREE (fndecl) = BIND_EXPR_BODY (block);
@@ -42641,6 +42642,7 @@ cp_parser_omp_declare_reduction (cp_parser *parser, cp_token *pragma_tok,
 	  if (processing_template_decl)
 	add_decl_expr (fndecl);
 	}
+
   cp_check_omp_declare_reduction (fndecl);
   if (cp == NULL && types.length () > 1)
 	cp = cp_token_cache_new (first_token,
diff --git c/gcc/cp/pt.c w/gcc/cp/pt.c
index a7b7a12b59f..4e212620eaf 100644
--- c/gcc/cp/pt.c
+++ w/gcc/cp/pt.c
@@ -18077,7 +18077,10 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
 			 && DECL_OMP_DECLARE_REDUCTION_P (decl)
 			 && DECL_FUNCTION_SCOPE_P (pattern_decl))
 		  {
-		DECL_CONTEXT (decl) = NULL_TREE;
+		/* We pretend this is regular local extern decl of
+		   a namespace-scope fn.  Then we make it really
+		   local, it is a nested function.  */
+		DECL_CONTEXT (decl) = global_namespace;
 		pushdecl (decl);
 		DECL_CONTEXT (decl) = current_function_decl;
 		cp_check_omp_declare_reduction (decl);
@@ -18899,7 +18902,7 @@ tsubst_omp_udr (tree t, tree args, tsubst_flags_t complain, tree in_decl)
   if (t == NULL_TREE || t == error_mark_node)
 return;
 
-  gcc_assert (TREE_CODE (t) == STATEMENT_LIST);
+  gcc_assert (TREE_CODE (t) == STATEMENT_LIST && current_function_decl);
 
   tree_stmt_iterator tsi;
   int i;
@@ -18919,6 +18922,8 @@ tsubst_omp_udr (tree t, tree args, tsubst_flags_t complain, tree in_decl)
 			 args, complain, in_decl);
   tree omp_in = tsubst (DECL_EXPR_DECL (stmts[1]),
 			args, complain, in_decl);
+  /* tsubsting a local var_decl leaves DECL_CONTEXT null, as we
+	 expect to be pushing it.  */
   DECL_CONTEXT (omp_out) = current_function_decl;
   DECL_CONTEXT (omp_in) = current_function_decl;
   keep_next_level (true);

Re: [PATCH] Practical Improvement to libgcc Complex Divide

2020-09-09 Thread Patrick McGehearty via Gcc-patches

On 9/9/2020 2:13 AM, Richard Biener wrote:

Thanks for working on this. Speaking about performance and
accuracy I spot a few opportunities to use FMAs [and eventually
vectorization] - do FMAs change anything on the accuracy analysis
(is there the chance they'd make it worse?). We might want to use
IFUNCs in libgcc to version for ISA variants (with/without FMA)?

Thanks,
Richard.

Richard, Thank you for bringing up the issue of fused multiply-add
(fma). All the results I presented in the latest patch were measured
with fma active. That's because in my early testing I ran experiments
and found that fma was consistently more accurate than no fma.

In response to your query, I repeated that set of tests on my
final submission and present them in the following table.

Number of results out of 10 million with greater than
or equal to the listed number of bits in error.

full range limited exponents
no fma with fma no fma with fma
1 bits= 20088 16664 34479 24707
2 bits= 1110 900 2359 1762
3 bits= 518 440 1163 882
4 bits= 197 143 612 445
5 bits= 102 72 313 232
6 bits= 49 43 170 119
7 bits= 25 21 82 49
8 bits= 16 11 33 26
9 bits= 9 5 14 14
10 bits= 3 3 8 4
11 bits= 2 2 3 2
12 bits= 1 1 0 2
No differences for 13 or greater bits.

Errors for both cases drop off rapidly as we increase the
number of bits required for a result to be considered
an error.

While using fma shows a consistent advantage in fewer errors,
there are cases were no fma gives a more accurate answer.
A detailed examination of the full range/7 bit case
which is listed as having 25 errors greater than 7 bits
for "no fma" and 21 errors greater than 7 bits for "fma"

In that test,
1 case had the same size error for both
8 cases had a larger error with no fma
21 cases had a larger error with fma.

Further examination showed those differences were generally less than
two bits. That summary makes clear that while using fma does not always
give a better answer, it occasionally provides a slight improvement.
Even in the limited exponent case where 1 bit
difference is counted as an error, using fma or not using
fma only shows a different result about 1 time in 1000.
While interesting, that size and frequency of difference
is not enough to support having two versions the library
routine in my mind.

I'd rather put further effort into improving the accuracy or performance
of other libgcc/glibc math functions. Just as one example, Paul
Zimmerman's work shows

opportunities. For more details, see:
https://urldefense.com/v3/__https://members.loria.fr/PZimmermann/papers/accuracy.pdf__;!!GqivPVa7Brio!PREWxi54-6JnIBbz8jjKEYGoZ3x6Nz5_4dXoalIf8uR1i3NKHHCgdGZJbzEXQmRMrmKmk38$

- patrick

Re: BoF DWARF5 patches (25% .debug section size reduction)

2020-09-09 Thread Mark Wielaard

Hi,

I added some fixes to binutils gas to make sure that it always
generates DWARF5 style .debug_rnglists and .debug_line with
.debug_line_str which are more efficient than the older .debug_ranges
and .debug_line data.

There is also a pending patch to make it possible to always pass
--gdwarf-N to gas even if gcc generates its own .debug_info and
.debug_line sections.
https://sourceware.org/pipermail/binutils/2020-September/113220.html

Ideally we have a configure check to make sure that as accepts
--gdwarf-N for N={2,3,4,5} and that when the assembly file already
contains a .debug_info or .debug_line section as doesn't error out. I
haven't written that yet. It requires some way to create target
specific assembler (could be NOPs) to check the .debug_line creation
by as.

Then we can add --gdwarf-N to ASM_SPEC when gcc generates debuginfo
for DWARF version N. The below has part of that, but always uses
--gdwarf-5, and adds it even for targets that don't use DWARF, which
is obviously wrong. But I don't fully understand if I should express
this in the spec or just depend on some configure check conditionals.

To compare the .debug section sizes generated between the current gcc
master default (DWARF4) on x86_64 and using DWARF5 by default I am
using binutils master plus the above unapproved patch plus the
attached patch to gcc to enable DWARF5 by default, pass --gdwarf-5 to
as and adding Jakub's patch to keep the static member variables in C++
classes. It keep locview enabled for now to make the comparison more
fair.

For libstdc++.so we get a 21M file with current master and a 17M when
making DWARF5 the default. The debug sections look as follows:

master lib64/libstdc++.so.6.0.29:
[31] .debug_aranges  PROGBITS  001da430 00015000 00 0  1
[32] .debug_info PROGBITS  001ef430 0079f7c3 00 0  1
[33] .debug_abbrev   PROGBITS  0098ebf3 00054e1a 00 0  1
[34] .debug_line PROGBITS  009e3a0d 001779c0 00 0  1
[35] .debug_str  PROGBITS  00b5b3cd 0012fbb6 1 MS 0 0  1
[36] .debug_loc  PROGBITS  00c8af83 005c05b0 00 0  1
[37] .debug_ranges   PROGBITS  0124b533 001b1140 00 0  1

dwarf5 lib64/libstdc++.so.6.0.29:
[32] .debug_aranges  PROGBITS  001d9350 00015000 00 0  1
[33] .debug_info PROGBITS  001ee350 0078b3d1 00 0  1
[34] .debug_abbrev   PROGBITS  00979721 00055972 00 0  1
[35] .debug_line PROGBITS  009cf093 0015c20b 00 0  1
[36] .debug_str  PROGBITS  00b2b29e 00130b55 1 MS 0 0  1
[37] .debug_loclists PROGBITS  00c5bdf3 00299d88 00 0  1
[38] .debug_rnglists PROGBITS  00ef5b7b 0009357e 00 0  1
[39] .debug_line_str PROGBITS  00f890f9 1685 1 MS 0 0  1

master:
.debug_aranges  00015000 0.08M
.debug_info 0079f7c3 7.62M
.debug_abbrev   00054e1a 0.33M
.debug_line 001779c0 1.47M
.debug_str  0012fbb6 1.19M
.debug_loc  005c05b0 5.75M
.debug_ranges   001b1140 1.69M
18.13M

dwarf5:
.debug_aranges  00015000 0.08M
.debug_info 0078b3d1 7.54M
.debug_abbrev   00055972 0.33M
.debug_line 0015c20b 1.36M
.debug_str  00130b55 1.19M
.debug_loclists 00299d88 2.60M
.debug_rnglists 0009357e 0.58M
.debug_line_str 1685 0.01M
13.69M

So the total size difference is 4.4MB with the DWARF5 loclists and
rngglists being much smaller than the DWARF5 locs and ranges, the
.debug_line section is slightly smaller because all directory/file
strings are now shared in the .debug_line_str. debug_info is also a
little bit smaller.

Cheers,

Markdiff --git a/gcc/common.opt b/gcc/common.opt
index dd68c61ae1d2..755df5445905 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3144,7 +3144,7 @@ Common Driver JoinedOrMissing Negative(gdwarf-)
 Generate debug information in default version of DWARF format.
 
 gdwarf-
-Common Driver Joined UInteger Var(dwarf_version) Init(4) Negative(gstabs)
+Common Driver Joined UInteger Var(dwarf_version) Init(5) Negative(gstabs)
 Generate debug information in DWARF v2 (or later) format.
 
 ggdb
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index bca8c856dc82..d69aa253f72b 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9057,13 +9057,14 @@ possible.
 @opindex gdwarf
 Produce debugging information in DWARF format (if that is supported).
 The value of @var{version} may be either 2, 3, 4 or 5; the default version
-for most targets is 4.  DWARF Version 5 is only experimental.
+for most targets is 5 (with the exception of vxworks and darwin which
+default to version 2).
 
 Note that with DWARF Version 2, some ports require and always
 use some non-conflicting DWARF 3 extensions in the unwind tables.
 
 Version 4 may require GDB 7.0 and @option{-fvar-tracking-assignments}
-for maximum benefit.
+for maximum benefit. Version 5

Re: [PATCH v1] [include] Add codes for DWARF v5 .dwp sections to dwarf2.h

2020-09-09 Thread Joseph Myers

On Wed, 9 Sep 2020, Caroline Tice via Gcc-patches wrote:

> For DWARF v5 Dwarf Package Files (.dwp files), the section identifier
> encodings have changed. This patch updates dwarf2.h to contain the new
> encodings.  (see http://dwarfstd.org/doc/DWARF5.pdf, section 7.3.5).
> 
> This patch has already been committed in binutils, but it needs to go into GCC
> as well to avoid the binutils patch being overwritten/lost.
> 
> I tested this by running the regression testsuite; there were no regressions.
> 
> Is this ok to commit?

In my view, anyone with write access should feel free to merge changes to 
shared files from the binutils-gdb tree at any time, without needing 
separate approval.

The only exception would be if there is deliberate divergence.  The only 
case I know of for deliberate divergence is to allow autotools versions to 
be updated separately in the two trees, since even updating one tree is 
quite involved (the last time that happened, it was a while between when 
Simon did the update in binutils-gdb and when I did the corresponding 
update for GCC; between the updates, care was needed about merging changes 
to auto*-related files).

-- 
Joseph S. Myers
jos...@codesourcery.com

[PATH 1/3] libstdc++: Simplify std::copy istreambuf_iterator overload

2020-09-09 Thread François Dumont via Gcc-patches

libstdc++: Use only public basic_streambuf methods in __copy_move_a2 
overload


__copy_move_a2 for istreambuf_iterator can be implemented using public
basic_streambuf in_avail and sgetn so that __copy_move_a2 do not need to be
basic_streambuf friend.

libstdc++-v3/ChangeLog:

    * include/std/streambuf (__copy_move_a2): Remove friend 
declaration.
    * include/bits/streambuf_iterator.h (__copy_move_a2): 
Re-implement using

    streambuf in_avail and sgetn.

Tested under Linux x86_64.

Ok to commit ?

François

diff --git a/libstdc++-v3/include/bits/streambuf_iterator.h b/libstdc++-v3/include/bits/streambuf_iterator.h
index 184c82cd5bf..8712b90edd6 100644
--- a/libstdc++-v3/include/bits/streambuf_iterator.h
+++ b/libstdc++-v3/include/bits/streambuf_iterator.h
@@ -367,31 +367,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		   istreambuf_iterator<_CharT> __last, _CharT* __result)
 {
   typedef istreambuf_iterator<_CharT>		   __is_iterator_type;
-  typedef typename __is_iterator_type::traits_type	   traits_type;
   typedef typename __is_iterator_type::streambuf_type  streambuf_type;
-  typedef typename traits_type::int_type		   int_type;
 
   if (__first._M_sbuf && !__last._M_sbuf)
 	{
 	  streambuf_type* __sb = __first._M_sbuf;
-	  int_type __c = __sb->sgetc();
-	  while (!traits_type::eq_int_type(__c, traits_type::eof()))
+	  std::streamsize __avail = __sb->in_avail();
+	  while (__avail > 0)
 	{
-	  const streamsize __n = __sb->egptr() - __sb->gptr();
-	  if (__n > 1)
-		{
-		  traits_type::copy(__result, __sb->gptr(), __n);
-		  __sb->__safe_gbump(__n);
-		  __result += __n;
-		  __c = __sb->underflow();
-		}
-	  else
-		{
-		  *__result++ = traits_type::to_char_type(__c);
-		  __c = __sb->snextc();
-		}
+	  __result += __sb->sgetn(__result, __avail);
+	  __avail = __sb->in_avail();
 	}
 	}
+
   return __result;
 }
 
diff --git a/libstdc++-v3/include/std/streambuf b/libstdc++-v3/include/std/streambuf
index cae35e75bda..13db284eb58 100644
--- a/libstdc++-v3/include/std/streambuf
+++ b/libstdc++-v3/include/std/streambuf
@@ -149,12 +149,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   friend streamsize
   __copy_streambufs_eof<>(basic_streambuf*, basic_streambuf*, bool&);
 
-  template
-friend typename __gnu_cxx::__enable_if<__is_char<_CharT2>::__value,
-	   _CharT2*>::__type
-__copy_move_a2(istreambuf_iterator<_CharT2>,
-		   istreambuf_iterator<_CharT2>, _CharT2*);
-
   template
 friend typename __gnu_cxx::__enable_if<__is_char<_CharT2>::__value,
   istreambuf_iterator<_CharT2> >::__type

[PATH 2/3] libstdc++: Simplify std::advance istreambuf_iterator overload

2020-09-09 Thread François Dumont via Gcc-patches


libstdc++: Use only public basic_streambuf methods in std::advance overload

std::advance overload for istreambuf_iterator can be implemented using
basic_streambuf public pubseekoff method so that it doesn't have to be
basic_streambuf friend.

libstdc++-v3/ChangeLog:

    * include/std/streambuf
    (advance(istreambuf_iterator<>&, _Distance)): Remove friend 
declaration.

    * include/bits/streambuf_iterator.h
    (advance(istreambuf_iterator<>&, _Distance)): Re-implement using
    streambuf pubseekoff.
    * 
testsuite/25_algorithms/advance/istreambuf_iterators/char/3.cc: New

    test.

Tested under Linux x86_64.

Ok to commit ?

François

diff --git a/libstdc++-v3/include/bits/streambuf_iterator.h b/libstdc++-v3/include/bits/streambuf_iterator.h
index 8712b90edd6..afe967e5f03 100644
--- a/libstdc++-v3/include/bits/streambuf_iterator.h
+++ b/libstdc++-v3/include/bits/streambuf_iterator.h
@@ -453,37 +453,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   __glibcxx_assert(__n > 0);
   __glibcxx_requires_cond(!__i._M_at_eof(),
-			  _M_message(__gnu_debug::__msg_inc_istreambuf)
-			  ._M_iterator(__i));
-
-  typedef istreambuf_iterator<_CharT>		   __is_iterator_type;
-  typedef typename __is_iterator_type::traits_type	   traits_type;
-  typedef typename __is_iterator_type::streambuf_type  streambuf_type;
-  typedef typename traits_type::int_type		   int_type;
-  const int_type __eof = traits_type::eof();
-
-  streambuf_type* __sb = __i._M_sbuf;
-  while (__n > 0)
+			  _M_message(__gnu_debug::__msg_advance_oob)
+			  ._M_iterator(__i)
+			  ._M_integer(__n));
+
+  typedef basic_streambuf<_CharT> __streambuf_t;
+  typedef typename __streambuf_t::pos_type __pos_t;
+  __pos_t __cur_pos
+	= __i._M_sbuf->pubseekoff(0, ios_base::cur, ios_base::in);
+  __pos_t __new_pos
+	= __i._M_sbuf->pubseekoff(__n, ios_base::cur, ios_base::in);
+  __i._M_c = char_traits<_CharT>::eof();
+
+  if (__new_pos - __cur_pos != __n)
 	{
-	  streamsize __size = __sb->egptr() - __sb->gptr();
-	  if (__size > __n)
-	{
-	  __sb->__safe_gbump(__n);
-	  break;
-	}
-
-	  __sb->__safe_gbump(__size);
-	  __n -= __size;
-	  if (traits_type::eq_int_type(__sb->underflow(), __eof))
-	{
-	  __glibcxx_requires_cond(__n == 0,
-_M_message(__gnu_debug::__msg_inc_istreambuf)
-._M_iterator(__i));
-	  break;
-	}
+	  __i._M_sbuf = 0;
+	  __glibcxx_requires_cond(!__i._M_at_eof(),
+  _M_message(__gnu_debug::__msg_advance_oob)
+  ._M_iterator(__i)
+  ._M_integer(__n));
 	}
-
-  __i._M_c = __eof;
 }
 
 // @} group iterators
diff --git a/libstdc++-v3/include/std/streambuf b/libstdc++-v3/include/std/streambuf
index 13db284eb58..53892636e47 100644
--- a/libstdc++-v3/include/std/streambuf
+++ b/libstdc++-v3/include/std/streambuf
@@ -155,11 +155,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 find(istreambuf_iterator<_CharT2>, istreambuf_iterator<_CharT2>,
 	 const _CharT2&);
 
-  template
-friend typename __gnu_cxx::__enable_if<__is_char<_CharT2>::__value,
-	   void>::__type
-advance(istreambuf_iterator<_CharT2>&, _Distance);
-
   friend void __istream_extract(istream&, char*, streamsize);
 
   template
diff --git a/libstdc++-v3/testsuite/25_algorithms/advance/istreambuf_iterators/char/3.cc b/libstdc++-v3/testsuite/25_algorithms/advance/istreambuf_iterators/char/3.cc
new file mode 100644
index 000..06b38fea91e
--- /dev/null
+++ b/libstdc++-v3/testsuite/25_algorithms/advance/istreambuf_iterators/char/3.cc
@@ -0,0 +1,49 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// Debug mode would detect the invalid std::advance call.
+// { dg-require-normal-mode "" }
+
+#include 
+#include 
+#include 
+
+#include 
+
+void test01()
+{
+  using namespace std;
+
+  typedef istreambuf_iterator in_iterator_type;
+
+  const char data1[] = "Drei Phantasien nach Friedrich Holderlin";
+  istringstream iss1(data1);
+  in_iterator_type beg1(iss1);
+  in_iterator_type end1;
+
+  VERIFY( beg1 != end1 );
+
+  advance(beg1, sizeof(data1));
+
+  VERIFY( beg1 == end1 );
+}
+
+int main()
+{
+  test01();
+  return 0;
+}

[PATH 3/3] libstdc++: Add std::advance ostreambuf_iterator overload

2020-09-09 Thread François Dumont via Gcc-patches


libstdc++: Add std::advance overload for ostreambuf_iterator

Implement std::advance overload for ostreambuf_iterator using 
basic_streambuf

pubseekof.

libstdc++-v3/ChangeLog:

    * include/bits/streambuf_iterator.h (ostreambuf_iterator): Add
    std::advance friend declaration.
    (advance(ostreambuf_iterator<>&, _Distance)): New.
    * testsuite/25_algorithms/advance/ostreambuf_iterator/char/1.cc:
    New test.
    * 
testsuite/25_algorithms/advance/ostreambuf_iterator/char/1_neg.cc:

    New test.
    * testsuite/25_algorithms/advance/ostreambuf_iterator/char/2.cc:
    New test.
    * 
testsuite/25_algorithms/advance/ostreambuf_iterator/char/2_neg.cc:

    New test.

Tested under Linux x85_64.

Ok to commit ?

François

diff --git a/libstdc++-v3/include/bits/streambuf_iterator.h b/libstdc++-v3/include/bits/streambuf_iterator.h
index afe967e5f03..1f6613e9ef6 100644
--- a/libstdc++-v3/include/bits/streambuf_iterator.h
+++ b/libstdc++-v3/include/bits/streambuf_iterator.h
@@ -257,6 +257,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	copy(istreambuf_iterator<_CharT2>, istreambuf_iterator<_CharT2>,
 	 ostreambuf_iterator<_CharT2>);
 
+  template
+	friend typename __gnu_cxx::__enable_if<__is_char<_CharT2>::__value,
+	   void>::__type
+	advance(ostreambuf_iterator<_CharT2>&, _Distance);
+
 private:
   streambuf_type*	_M_sbuf;
   bool		_M_failed;
@@ -405,7 +410,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
 typename __gnu_cxx::__enable_if<__is_char<_CharT>::__value,
-		  		istreambuf_iterator<_CharT> >::__type
+istreambuf_iterator<_CharT> >::__type
 find(istreambuf_iterator<_CharT> __first,
 	 istreambuf_iterator<_CharT> __last, const _CharT& __val)
 {
@@ -475,6 +480,37 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	}
 }
 
+  template
+typename __gnu_cxx::__enable_if<__is_char<_CharT>::__value,
+void>::__type
+advance(ostreambuf_iterator<_CharT>& __i, _Distance __n)
+{
+  if (__n == 0)
+	return;
+
+  __glibcxx_assert(__n > 0);
+  __glibcxx_requires_cond(!__i.failed(),
+			  _M_message(__gnu_debug::__msg_advance_oob)
+			  ._M_iterator(__i)
+			  ._M_integer(__n));
+
+  typedef basic_streambuf<_CharT> __streambuf_t;
+  typedef typename __streambuf_t::pos_type __pos_t;
+  __pos_t __cur_pos
+	= __i._M_sbuf->pubseekoff(0, ios_base::cur, ios_base::out);
+  __pos_t __new_pos =
+	__i._M_sbuf->pubseekoff(__n, ios_base::cur, ios_base::out);
+
+  if (__new_pos - __cur_pos != __n)
+	{
+	  __i._M_failed = true;
+	  __glibcxx_requires_cond(!__i.failed(),
+  _M_message(__gnu_debug::__msg_advance_oob)
+  ._M_iterator(__i)
+  ._M_integer(__n));
+	}
+}
+
 // @} group iterators
 
 _GLIBCXX_END_NAMESPACE_VERSION
diff --git a/libstdc++-v3/testsuite/25_algorithms/advance/ostreambuf_iterator/char/1.cc b/libstdc++-v3/testsuite/25_algorithms/advance/ostreambuf_iterator/char/1.cc
new file mode 100644
index 000..dd70cc67c75
--- /dev/null
+++ b/libstdc++-v3/testsuite/25_algorithms/advance/ostreambuf_iterator/char/1.cc
@@ -0,0 +1,55 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+#include 
+#include 
+#include 
+
+#include 
+
+void test01()
+{
+  using namespace std;
+
+  const char data1[] = "Drei Phantasien nach Friedrich Holderlin";
+  string str1(data1);
+  str1[17] = 'i';
+
+  ostringstream oss1(str1);
+  ostreambuf_iterator beg1(oss1);
+
+  std::advance(beg1, 17);
+  *beg1 = 'a';
+
+  VERIFY( !beg1.failed() );
+  VERIFY( oss1.str() == data1 );
+  str1 = oss1.str();
+
+  // -1 for the trailing '\0'
+  // -1 for the beg1 assignment.
+  std::advance(beg1, sizeof(data1) - 17 - 1 - 1);
+  *beg1 = '.';
+
+  str1 += '.';
+  VERIFY( oss1.str() == str1 );
+}
+
+int main()
+{
+  test01();
+  return 0;
+}
diff --git a/libstdc++-v3/testsuite/25_algorithms/advance/ostreambuf_iterator/char/1_neg.cc b/libstdc++-v3/testsuite/25_algorithms/advance/ostreambuf_iterator/char/1_neg.cc
new file mode 100644
index 000..8d266256ed3
--- /dev/null
+++ b/libstdc++-v3/testsuite/25_algorithms/advance/ostreambuf_iterator/char/1_neg.cc
@@ -0,0 +1,40 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.

Re: [PATCH v3] c++: Further tweaks for new-expression and paren-init [PR77841]


On 9/8/20 10:34 PM, Marek Polacek wrote:

On Tue, Sep 08, 2020 at 04:19:42PM -0400, Jason Merrill wrote:

On 9/8/20 4:06 PM, Marek Polacek wrote:

On Mon, Sep 07, 2020 at 11:19:47PM -0400, Jason Merrill wrote:

On 9/6/20 11:34 AM, Marek Polacek wrote:

@@ -3944,9 +3935,9 @@ build_new (location_t loc, vec **placement, 
tree type,
}
  /* P1009: Array size deduction in new-expressions.  */
-  if (TREE_CODE (type) == ARRAY_TYPE
-  && !TYPE_DOMAIN (type)
-  && *init)
+  const bool deduce_array_p = (TREE_CODE (type) == ARRAY_TYPE
+  && !TYPE_DOMAIN (type));
+  if (*init && (deduce_array_p || (nelts && cxx_dialect >= cxx20)))


Looks like this won't handle new (char[4]), for which we also get an
ARRAY_TYPE.


Good catch.  Fixed & paren-init37.C added.


{
  /* This means we have 'new T[]()'.  */
  if ((*init)->is_empty ())
@@ -3955,16 +3946,20 @@ build_new (location_t loc, vec 
**placement, tree type,
  CONSTRUCTOR_IS_DIRECT_INIT (ctor) = true;
  vec_safe_push (*init, ctor);
}
+  tree array_type = deduce_array_p ? TREE_TYPE (type) : type;


I'd call this variable elt_type.


Right, and it should be inside the block below.


  tree &elt = (**init)[0];
  /* The C++20 'new T[](e_0, ..., e_k)' case allowed by P0960.  */
  if (!DIRECT_LIST_INIT_P (elt) && cxx_dialect >= cxx20)
{
- /* Handle new char[]("foo").  */
+ /* Handle new char[]("foo"): turn it into new char[]{"foo"}.  */
  if (vec_safe_length (*init) == 1
- && char_type_p (TYPE_MAIN_VARIANT (TREE_TYPE (type)))
+ && char_type_p (TYPE_MAIN_VARIANT (array_type))
  && TREE_CODE (tree_strip_any_location_wrapper (elt))
 == STRING_CST)
-   /* Leave it alone: the string should not be wrapped in {}.  */;
+   {
+ elt = build_constructor_single (init_list_type_node, NULL_TREE, 
elt);
+ CONSTRUCTOR_IS_DIRECT_INIT (elt) = true;
+   }
  else
{
  tree ctor = build_constructor_from_vec (init_list_type_node, 
*init);


With this change, doesn't the string special case produce the same result as
the general case?


The problem is that reshape_init won't do anything for 
CONSTRUCTOR_IS_PAREN_INIT.


Ah, yes, that flag is the difference.


So the reshape_init in build_new_1 wouldn't unwrap the outermost { } around
a STRING_CST.



Perhaps reshape_init should be adjusted to do that unwrapping even when it gets
a CONSTRUCTOR_IS_PAREN_INIT CONSTRUCTOR.  But I'm not sure if it should also do
the reference_related_p unwrapping in reshape_init_r in that case.


That would make sense to me.


Done (but only for the outermost CONSTRUCTOR) in the below.  It allowed me to...


@@ -3977,9 +3972,15 @@ build_new (location_t loc, vec **placement, 
tree type,
}
}
  /* Otherwise we should have 'new T[]{e_0, ..., e_k}'.  */
-  if (BRACE_ENCLOSED_INITIALIZER_P (elt))
-   elt = reshape_init (type, elt, complain);
-  cp_complete_array_type (&type, elt, /*do_default*/false);
+  if (deduce_array_p)
+   {
+ /* Don't reshape ELT itself: we want to pass a list-initializer to
+build_new_1, even for STRING_CSTs.  */
+ tree e = elt;
+ if (BRACE_ENCLOSED_INITIALIZER_P (e))
+   e = reshape_init (type, e, complain);


The comment is unclear; this call does reshape the CONSTRUCTOR ELT points
to, it just doesn't change ELT if the reshape call returns something else.


Yea, I've amended the comment.


Why are we reshaping here, anyway?  Won't that lead to undesired brace
elision?


We have to reshape before deducing the array, otherwise we could deduce the
wrong number of elements when certain braces were omitted.  E.g. in

struct S { int x, y; };
new S[]{1, 2, 3, 4}; // braces elided, is { {1, 2}, {3, 4} }


Ah, right, we also get here for initializers written with actual braces.


we want S[2], not S[4].  A way to test it would be

struct S { int x, y; };
S *p = new S[]{1, 2, 3, 4};

void* operator new (unsigned long int size)
{
if (size != sizeof (S) * 2)
__builtin_abort ();
return __builtin_malloc (size);
}

int main () { }

I can add that too, if you want.  (It'd be safer if cp_complete_array_type
always reshaped but that's not trivial, as the original patch mentions.)
()-init-list wouldn't be reshaped because CONSTRUCTOR_IS_PAREN_INIT is set.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

Thanks,

-- >8 --
This patch corrects our handling of array new-expression with ()-init:

new int[4](1, 2, 3, 4);

should work even with the explicit array bound, and

new char[3]("so_sad");

should cause an error, but we weren't giving any.

Fixed by handling array new-expressions with ()-init in the same spot
where we deduce the array bound in array new-expression.  I'

Re: [PATCH v2] c++: Fix ICE in reshape_init with init-list [PR95164]


On 9/9/20 3:33 PM, Marek Polacek wrote:

On Mon, Sep 07, 2020 at 06:23:01PM -0400, Jason Merrill via Gcc-patches wrote:

On 9/4/20 5:39 PM, Marek Polacek wrote:

This patch fixes a long-standing bug in reshape_init_r.  Since r209314
we implement DR 1467 which handles list-initialization with a single
initializer of the same type as the target.  In this test this causes
a crash in reshape_init_r when we're processing a constructor that has
undergone the DR 1467 transformation.

Take e.g. the

foo({{1, {H{k);

line in the attached test.  {H{k}} initializes the field b of H in I.
H{k} is a functional cast, so has TREE_HAS_CONSTRUCTOR set, so is
COMPOUND_LITERAL_P.  We perform the DR 1467 transformation and turn
{H{k}} into H{k}.  Then we attempt to reshape H{k} again and since
first_initializer_p is null and it's COMPOUND_LITERAL_P, we go here:

 else if (COMPOUND_LITERAL_P (stripped_init))
   gcc_assert (!BRACE_ENCLOSED_INITIALIZER_P (stripped_init));


It looks to me like the bug is here:


   /* [dcl.init.aggr]
All implicit type conversions (clause _conv_) are considered when
initializing the aggregate member with an initializer from an
initializer-list.  If the initializer can initialize a member,
the member is initialized.  Otherwise, if the member is itself a
non-empty subaggregate, brace elision is assumed and the
initializer is considered for the initialization of the first
member of the subaggregate.  */
   if (TREE_CODE (init) != CONSTRUCTOR
   /* But don't try this for the first initializer, since that would
be  looking through the
outermost braces; A a2 = { a1 }; is not a
valid aggregate initialization.  */
   && !first_initializer_p
   && (same_type_ignoring_top_level_qualifiers_p (type, TREE_TYPE (init))
   || can_convert_arg (type, TREE_TYPE (init), init, LOOKUP_NORMAL,
   complain)))
 {
   d->cur++;
   return init;
 }


We ought to handle H{k} here, treat it as the initializer for the member,
and not get as far as the code you quote above.


Like this?  When we have a COMPOUND_LITERAL_P, then I think we don't need
to check cxx11, or CLASS_TYPE, or d.end - d.cur, because that's inherent.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10?

-- >8 --
This patch fixes a long-standing bug in reshape_init_r.  Since r209314
we implement DR 1467 which handles list-initialization with a single
initializer of the same type as the target.  In this test this causes
a crash in reshape_init_r when we're processing a constructor that has
undergone the DR 1467 transformation.

Take e.g. the

   foo({{1, {H{k);

line in the attached test.  {H{k}} initializes the field b of H in I.
H{k} is a functional cast, so has TREE_HAS_CONSTRUCTOR set, so is
COMPOUND_LITERAL_P.  We perform the DR 1467 transformation and turn
{H{k}} into H{k}.  Then we attempt to reshape H{k} again and since
first_initializer_p is null and it's COMPOUND_LITERAL_P, we go here:

else if (COMPOUND_LITERAL_P (stripped_init))
  gcc_assert (!BRACE_ENCLOSED_INITIALIZER_P (stripped_init));

then complain about the missing braces, go to reshape_init_class and ICE
on
gcc_checking_assert (d->cur->index
 == get_class_binding (type, id));

because due to the missing { } we're looking for 'b' in H, but that's
not found.

So we have to be prepared to handle an initializer whose outer braces
have been removed due to DR 1467.

gcc/cp/ChangeLog:

PR c++/95164
* decl.c (reshape_init_r): When we've found a missing set of braces
as a result of the DR 1467 transformation, don't reshape again.

gcc/testsuite/ChangeLog:

PR c++/95164
* g++.dg/cpp0x/initlist123.C: New test.
---
  gcc/cp/decl.c|  8 -
  gcc/testsuite/g++.dg/cpp0x/initlist123.C | 39 
  2 files changed, 46 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/initlist123.C

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 31d68745844..6565cd7199b 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -6466,7 +6466,13 @@ reshape_init_r (tree type, reshape_iter *d, tree 
first_initializer_p,
   non-empty subaggregate, brace elision is assumed and the
   initializer is considered for the initialization of the first
   member of the subaggregate.  */
-  if (TREE_CODE (init) != CONSTRUCTOR
+  if ((TREE_CODE (init) != CONSTRUCTOR
+   /* If we previously elided the braces around the single element
+ of an initializer list when initializing an object of the same
+ class type, don't report missing braces or reshape again.  In
+ this case the braces had been enclosing a compound literal or
+ functional cast with aggregate, e.g. {S{}} -> S{}.  */


Don't we also get here for a compound literal without elided braces? 
I'm not

[committed 3/3] analyzer: eliminate sm_context::warn_for_state in favor of a new 'warn' vfunc

This patch is yet more preliminary work towards generalizing sm-malloc.cc
beyond just malloc/free.

It eliminates sm_context::warn_for_state in terms of a new sm_context::warn
vfunc, guarded by sm_context::get_state calls.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as 25ef215abb1aa701db7ab173b9f2ac653cecf634.

gcc/analyzer/ChangeLog:
* diagnostic-manager.cc
(null_assignment_sm_context::warn_for_state): Replace with...
(null_assignment_sm_context::warn): ...this.
* engine.cc (impl_sm_context::warn_for_state): Replace with...
(impl_sm_context::warn): ...this.
* sm-file.cc (fileptr_state_machine::on_stmt): Replace
warn_for_state and on_transition calls with a get_state
test guarding warn and set_next_state calls.
* sm-malloc.cc (malloc_state_machine::on_stmt): Likewise.
* sm-pattern-test.cc (pattern_test_state_machine::on_condition):
Replace warn_for_state call with warn call.
* sm-sensitive.cc
(sensitive_state_machine::warn_for_any_exposure): Replace
warn_for_state call with a get_state test guarding a warn call.
* sm-signal.cc (signal_state_machine::on_stmt): Likewise.
* sm-taint.cc (taint_state_machine::on_stmt):  Replace
warn_for_state and on_transition calls with a get_state
test guarding warn and set_next_state calls.
* sm.h (sm_context::warn_for_state): Replace with...
(sm_context::warn): ...this.
---
 gcc/analyzer/diagnostic-manager.cc |  5 +-
 gcc/analyzer/engine.cc | 34 +
 gcc/analyzer/sm-file.cc|  9 ++--
 gcc/analyzer/sm-malloc.cc  | 77 +++---
 gcc/analyzer/sm-pattern-test.cc|  2 +-
 gcc/analyzer/sm-sensitive.cc   |  5 +-
 gcc/analyzer/sm-signal.cc  |  7 +--
 gcc/analyzer/sm-taint.cc   | 49 +++
 gcc/analyzer/sm.h  |  8 ++--
 9 files changed, 108 insertions(+), 88 deletions(-)

diff --git a/gcc/analyzer/diagnostic-manager.cc 
b/gcc/analyzer/diagnostic-manager.cc
index 6fd15c21962..4a95d4c569e 100644
--- a/gcc/analyzer/diagnostic-manager.cc
+++ b/gcc/analyzer/diagnostic-manager.cc
@@ -808,9 +808,8 @@ struct null_assignment_sm_context : public sm_context
*m_new_state));
   }
 
-  void warn_for_state (const supernode *, const gimple *,
-  tree, state_machine::state_t,
-  pending_diagnostic *d) FINAL OVERRIDE
+  void warn (const supernode *, const gimple *,
+tree, pending_diagnostic *d) FINAL OVERRIDE
   {
 delete d;
   }
diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index 07b1b15d195..49701b74fd4 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -254,35 +254,23 @@ public:
   to, origin_new_sval, m_eg.get_ext_state ());
   }
 
-  void warn_for_state (const supernode *snode, const gimple *stmt,
-  tree var, state_machine::state_t state,
-  pending_diagnostic *d) FINAL OVERRIDE
+  void warn (const supernode *snode, const gimple *stmt,
+tree var, pending_diagnostic *d) FINAL OVERRIDE
   {
 LOG_FUNC (get_logger ());
 gcc_assert (d); // take ownership
-
 impl_region_model_context old_ctxt
   (m_eg, m_enode_for_diag, m_old_state, m_new_state, NULL);
-state_machine::state_t current;
-if (var)
-  {
-   const svalue *var_old_sval
- = m_old_state->m_region_model->get_rvalue (var, &old_ctxt);
-   current = m_old_smap->get_state (var_old_sval, m_eg.get_ext_state ());
-  }
-else
-  current = m_old_smap->get_global_state ();
 
-if (state == current)
-  {
-   const svalue *var_old_sval
- = m_old_state->m_region_model->get_rvalue (var, &old_ctxt);
-   m_eg.get_diagnostic_manager ().add_diagnostic
- (&m_sm, m_enode_for_diag, snode, stmt, m_stmt_finder,
-  var, var_old_sval, state, d);
-  }
-else
-  delete d;
+const svalue *var_old_sval
+  = m_old_state->m_region_model->get_rvalue (var, &old_ctxt);
+state_machine::state_t current
+  = (var
+? m_old_smap->get_state (var_old_sval, m_eg.get_ext_state ())
+: m_old_smap->get_global_state ());
+m_eg.get_diagnostic_manager ().add_diagnostic
+  (&m_sm, m_enode_for_diag, snode, stmt, m_stmt_finder,
+   var, var_old_sval, current, d);
   }
 
   /* Hook for picking more readable trees for SSA names of temporaries,
diff --git a/gcc/analyzer/sm-file.cc b/gcc/analyzer/sm-file.cc
index 33b445195d5..58a0fd461fa 100644
--- a/gcc/analyzer/sm-file.cc
+++ b/gcc/analyzer/sm-file.cc
@@ -344,9 +344,12 @@ fileptr_state_machine::on_stmt (sm_context *sm_ctxt,
 
sm_ctxt->on_transition (node, stmt , arg, m_nonnull, m_closed);
 
-   sm_ctxt->warn_for_state (node, stmt, arg, m_closed,
-

[committed 2/3] analyzer: reimplement on_transition in terms of get_state/set_next_state

This patch is further preliminary work towards generalizing sm-malloc.cc
beyond just malloc/free.

Reimplement sm_context's on_transition vfunc in terms of new get_state
and set_next_state vfuncs, so that in followup patches we can implement
richer transitions (e.g. where the states are parametrized by
allocator).

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as 6d9ca8c8604e2e7c2403794baf691b260cc71fb9.

gcc/analyzer/ChangeLog:
* diagnostic-manager.cc
(null_assignment_sm_context::null_assignment_sm_context): Add old_state
and ext_state params, initializing m_old_state and m_ext_state.
(null_assignment_sm_context::on_transition): Split into...
(null_assignment_sm_context::get_state): ...this new vfunc
implementation and...
(null_assignment_sm_context::set_next_state): ...this new vfunc
implementation.
(null_assignment_sm_context::m_old_state): New field.
(null_assignment_sm_context::m_ext_state): New field.
(diagnostic_manager::add_events_for_eedge): Pass in old state and
ext_state when creating sm_ctxt.
* engine.cc (impl_sm_context::on_transition): Split into...
(impl_sm_context::get_state): ...this new vfunc
implementation and...
(impl_sm_context::set_next_state): ...this new vfunc
implementation.
* sm.h (sm_context::get_state): New pure virtual function.
(sm_context::set_next_state): Likewise.
(sm_context::on_transition): Convert from a pure virtual function
to a regular function implemented in terms of get_state and
set_next_state.
---
 gcc/analyzer/diagnostic-manager.cc | 41 +++
 gcc/analyzer/engine.cc | 45 +++---
 gcc/analyzer/sm.h  | 27 ++
 3 files changed, 80 insertions(+), 33 deletions(-)

diff --git a/gcc/analyzer/diagnostic-manager.cc 
b/gcc/analyzer/diagnostic-manager.cc
index 04c7d2ac4d3..6fd15c21962 100644
--- a/gcc/analyzer/diagnostic-manager.cc
+++ b/gcc/analyzer/diagnostic-manager.cc
@@ -754,12 +754,15 @@ struct null_assignment_sm_context : public sm_context
 {
   null_assignment_sm_context (int sm_idx,
  const state_machine &sm,
+ const program_state *old_state,
  const program_state *new_state,
  const gimple *stmt,
  const program_point *point,
- checker_path *emission_path)
-  : sm_context (sm_idx, sm), m_new_state (new_state),
-m_stmt (stmt), m_point (point), m_emission_path (emission_path)
+ checker_path *emission_path,
+ const extrinsic_state &ext_state)
+  : sm_context (sm_idx, sm), m_old_state (old_state), m_new_state (new_state),
+m_stmt (stmt), m_point (point), m_emission_path (emission_path),
+m_ext_state (ext_state)
   {
   }
 
@@ -768,13 +771,25 @@ struct null_assignment_sm_context : public sm_context
 return NULL_TREE;
   }
 
-  void on_transition (const supernode *node ATTRIBUTE_UNUSED,
- const gimple *stmt ATTRIBUTE_UNUSED,
- tree var,
- state_machine::state_t from,
- state_machine::state_t to,
- tree origin ATTRIBUTE_UNUSED) FINAL OVERRIDE
+  state_machine::state_t get_state (const gimple *stmt ATTRIBUTE_UNUSED,
+   tree var) FINAL OVERRIDE
   {
+const svalue *var_old_sval
+  = m_old_state->m_region_model->get_rvalue (var, NULL);
+const sm_state_map *old_smap = m_old_state->m_checker_states[m_sm_idx];
+
+state_machine::state_t current
+  = old_smap->get_state (var_old_sval, m_ext_state);
+
+return current;
+  }
+
+  void set_next_state (const gimple *stmt,
+  tree var,
+  state_machine::state_t to,
+  tree origin ATTRIBUTE_UNUSED) FINAL OVERRIDE
+  {
+state_machine::state_t from = get_state (stmt, var);
 if (from != m_sm.get_start_state ())
   return;
 
@@ -791,7 +806,6 @@ struct null_assignment_sm_context : public sm_context
from, to,
NULL,
*m_new_state));
-
   }
 
   void warn_for_state (const supernode *, const gimple *,
@@ -833,11 +847,13 @@ struct null_assignment_sm_context : public sm_context
 return NULL_TREE;
   }
 
+  const program_state *m_old_state;
   const program_state *m_new_state;
   const gimple *m_stmt;
   const program_point *m_point;
   state_change_visitor *m_visitor;
   checker_path *m_emission_path;
+  const extrinsic_state &m_ext_state;
 };
 
 /* Subroutine of diagnostic_manager::build_emission_path.
@@ -943,15 +959,18 @@ di

[committed 1/3] analyzer: use objects for state_machine::state_t

This patch is preliminary work towards generalizing sm-malloc.cc so that
it can check APIs other than just malloc/free (and e.g. detect
mismatching alloc/dealloc pairs).

Generalize states in state machines so that, rather than state_t being
just an "unsigned", it becomes a "const state *", where the underlying
state objects are immutable objects managed by the state machine in
question, and can e.g. have vfuncs and extra fields.  The start state
m_start becomes a member of the state_machine base_class.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as 10fc42a8396072912e9d9d940fba25950b3fdfc5.

gcc/analyzer/ChangeLog:
* checker-path.cc (state_change_event::get_desc): Update
state_machine::get_state_name calls to state::get_name.
(warning_event::get_desc): Likewise.
* diagnostic-manager.cc
(null_assignment_sm_context::on_transition): Update comparison
against 0 with comparison with m_sm.get_start_state.
(diagnostic_manager::prune_for_sm_diagnostic): Update
state_machine::get_state_name calls to state::get_name.
* engine.cc (impl_sm_context::on_transition): Likewise.
(exploded_node::get_dot_fillcolor): Use get_id when summing
the sm states.
* program-state.cc (sm_state_map::sm_state_map): Don't hardcode
0 as the start state when initializing m_global_state.
(sm_state_map::print): Use dump_to_pp rather than get_state_name
when dumping states.
(sm_state_map::is_empty_p): Don't hardcode 0 as the start state
when examining m_global_state.
(sm_state_map::hash): Use get_id when hashing states.
(selftest::test_sm_state_map): Use state objects rather than
arbitrary hardcoded integers.
(selftest::test_program_state_merging): Likewise.
(selftest::test_program_state_merging_2): Likewise.
* sm-file.cc (fileptr_state_machine::m_start): Move to base class.
(file_diagnostic::describe_state_change): Use get_start_state.
(fileptr_state_machine::fileptr_state_machine): Drop m_start
initialization.
* sm-malloc.cc (malloc_state_machine::m_start): Move to base
class.
(malloc_diagnostic::describe_state_change): Use get_start_state.
(possible_null::describe_state_change): Likewise.
(malloc_state_machine::malloc_state_machine): Drop m_start
initialization.
* sm-pattern-test.cc (pattern_test_state_machine::m_start): Move
to base class.
(pattern_test_state_machine::pattern_test_state_machine): Drop
m_start initialization.
* sm-sensitive.cc (sensitive_state_machine::m_start): Move to base
class.
(sensitive_state_machine::sensitive_state_machine): Drop m_start
initialization.
* sm-signal.cc (signal_state_machine::m_start): Move to base
class.
(signal_state_machine::signal_state_machine): Drop m_start
initialization.
* sm-taint.cc (taint_state_machine::m_start): Move to base class.
(taint_state_machine::taint_state_machine): Drop m_start
initialization.
* sm.cc (state_machine::state::dump_to_pp): New.
(state_machine::state_machine): Move here from sm.h.  Initialize
m_next_state_id and m_start.
(state_machine::add_state): Reimplement in terms of state objects.
(state_machine::get_state_name): Delete.
(state_machine::get_state_by_name): Reimplement in terms of state
objects.  Make const.
(state_machine::validate): Delete.
(state_machine::dump_to_pp): Reimplement in terms of state
objects.
* sm.h (state_machine::state): New class.
(state_machine::state_t): Convert typedef from "unsigned" to
"const state_machine::state *".
(state_machine::state_machine): Move to sm.cc.
(state_machine::get_default_state): Use m_start rather than
hardcoding 0.
(state_machine::get_state_name): Delete.
(state_machine::get_state_by_name): Make const.
(state_machine::get_start_state): New accessor.
(state_machine::alloc_state_id): New.
(state_machine::m_state_names): Drop in favor of...
(state_machine::m_states): New field
(state_machine::m_start): New field
(start_start_p): Delete.
---
 gcc/analyzer/checker-path.cc   | 25 +--
 gcc/analyzer/diagnostic-manager.cc | 10 ++---
 gcc/analyzer/engine.cc |  8 ++--
 gcc/analyzer/program-state.cc  | 72 ++
 gcc/analyzer/sm-file.cc|  6 +--
 gcc/analyzer/sm-malloc.cc  |  8 +---
 gcc/analyzer/sm-pattern-test.cc|  4 --
 gcc/analyzer/sm-sensitive.cc   |  4 --
 gcc/analyzer/sm-signal.cc  |  4 --
 gcc/analyzer/sm-taint.cc   |  4 --
 gcc/analyzer/sm.cc | 62 ++---
 gcc/analyzer/sm.h  | 48

Re: [PATCH V2 0/4] Unify C and C++ handling of loops and switches


On 8/13/20 12:34 PM, Sandra Loosemore wrote:

This is a revised version of the patch set originally posted
last November:

https://gcc.gnu.org/pipermail/gcc-patches/2019-November/534142.html

In addition to generally updating and rebasing the patches to reflect
other changes on mainline in the meantime, for this version I have
switched to using the C lowering strategy (directly to goto form)
rather than the C++ one (to LOOP_EXPR) because of regressions in the C
optimization tests.  Besides the ones previously noted in the original
patch submission, there were a bunch of new ones since November.  Some
of them were trivial to fix (e.g., flipping branch probabilities to
reflect the different sense of the loop exit condition in the
C++-style output), but I wasn't making much progress on others and
eventually decided to pursue the "plan B" of using the C-style output
everywhere, as discussed here:

https://gcc.gnu.org/pipermail/gcc-patches/2019-December/536536.html

The only regression I ran into with this was a bootstrap failure
building the Fortran front end from a new -Wmaybe-uninitialized error.
This might be a false positive but part 3 of the new series works
around it by adding an assertion to give g++ a hint.  Unfortunately I
had no luck in trying to reduce this to a standalone test case, but I
did observe that the failure went away when I compiled that file with
debugging enabled.  :-S  I could file a PR to look into this further if
the workaround is good enough for now.


My impression from Jeff's analysis in January and David's in March was 
that many of the testsuite changes were from the C++ approach actually 
providing better results, so the reversal here surprises me.  Can you 
talk more about the regressions you're seeing?


Jason

[PATCH] rs6000: Fix instruction type

2020-09-09 Thread Pat Haugen via Gcc-patches

I noticed that some of the VSR<->GPR move instructions are not typed
correctly. This patch fixes those instructions so that the scheduler
treats them with the correct latency.

Bootstrap/regtest on powerpc64le with no new regressions. Also ran a
CPU2017 benchmark comparison on Power9 with no major differences (a
couple minor
improvements and no degradations). Ok for trunk?

-Pat


2020-09-09  Pat Haugen  

gcc/
* gcc/config/rs6000/rs6000.md
(lfiwzx, floatunssi2_lfiwzx, p8_mtvsrwz, p8_mtvsrd_sf): Fix insn
type.
* gcc/config/rs6000/vsx.md
(vsx_concat_, vsx_splat__reg, vsx_splat_v4sf): Likewise.



diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 43b620ae1c0..f902c864c26 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -5483,7 +5483,7 @@ (define_insn "lfiwzx"
lxsiwzx %x0,%y1
mtvsrwz %x0,%1
xxextractuw %x0,%x1,4"
-  [(set_attr "type" "fpload,fpload,mftgpr,vecexts")
+  [(set_attr "type" "fpload,fpload,mffgpr,vecexts")
(set_attr "isa" "*,p8v,p8v,p9v")])

 (define_insn_and_split "floatunssi2_lfiwzx"
@@ -7634,7 +7634,7 @@ (define_insn_and_split "movsf_from_si"
 *,  12,*, *")
(set_attr "type"
"load,   fpload,fpload,fpload,store, fpstore,
-fpstore,vecfloat,  mffgpr,*")
+fpstore,vecfloat,  mftgpr,*")
(set_attr "isa"
"*,  *, p9v,   p8v,   *, *,
 p8v,p8v,   p8v,   *")])
@@ -8711,7 +8711,7 @@ (define_insn "p8_mtvsrwz"
   UNSPEC_P8V_MTVSRWZ))]
   "!TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
   "mtvsrwz %x0,%1"
-  [(set_attr "type" "mftgpr")])
+  [(set_attr "type" "mffgpr")])

 (define_insn_and_split "reload_fpr_from_gpr"
   [(set (match_operand:FMOVE64X 0 "register_operand" "=d")
@@ -8810,7 +8810,7 @@ (define_insn "p8_mtvsrd_sf"
   UNSPEC_P8V_MTVSRD))]
   "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
   "mtvsrd %x0,%1"
-  [(set_attr "type" "mftgpr")])
+  [(set_attr "type" "mffgpr")])

 (define_insn_and_split "reload_vsx_from_gprsf"
   [(set (match_operand:SF 0 "register_operand" "=wa")
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 54da54c43dc..3a5cf896da8 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -2885,7 +2885,7 @@ (define_insn "vsx_concat_"
   else
 gcc_unreachable ();
 }
-  [(set_attr "type" "vecperm")])
+  [(set_attr "type" "vecperm,vecmove")])

 ;; Combiner patterns to allow creating XXPERMDI's to access either double
 ;; word element in a vector register.
@@ -4440,7 +4440,7 @@ (define_insn "vsx_splat__reg"
   "@
xxpermdi %x0,%x1,%x1,0
mtvsrdd %x0,%1,%1"
-  [(set_attr "type" "vecperm")])
+  [(set_attr "type" "vecperm,vecmove")])

 (define_insn "vsx_splat__mem"
   [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
@@ -4493,7 +4493,7 @@ (define_insn_and_split "vsx_splat_v4sf"
(unspec:V4SF [(match_dup 0)
  (const_int 0)] UNSPEC_VSX_XXSPLTW))]
   ""
-  [(set_attr "type" "vecload,vecperm,mftgpr")
+  [(set_attr "type" "vecload,vecperm,vecperm")
(set_attr "length" "*,8,*")
(set_attr "isa" "*,p8v,*")])

[committed] analyzer: generalize sm-malloc to new/delete [PR94355]

This patch generalizes the state machine in sm-malloc.cc to support
multiple allocator APIs, and adds just enough support for C++ new and
delete to demonstrate the feature, allowing for detection of code
paths where the result of new in C++ can leak - for some crude examples,
at least (bearing in mind that the analyzer doesn't yet know about
e.g. vfuncs, exceptions, inheritance, RTTI, etc)

It also implements a new warning: -Wanalyzer-mismatching-deallocation.
For example:

demo.cc: In function 'void test()':
demo.cc:8:8: warning: 'f' should have been deallocated with 'delete'
  but was deallocated with 'free' [CWE-762] 
[-Wanalyzer-mismatching-deallocation]
8 |   free (f);
  |   ~^~~
  'void test()': events 1-2
|
|7 |   foo *f = new foo;
|  |^~~
|  ||
|  |(1) allocated here (expects deallocation with 
'delete')
|8 |   free (f);
|  |   
|  ||
|  |(2) deallocated with 'free' here; allocation at (1) expects 
deallocation with 'delete'
|

The patch also adds just enough knowledge of exception-handling to
suppress a false positive from -Wanalyzer-malloc-leak on
g++.dg/analyzer/pr96723.C on the exception-handling CFG edge after
operator new.  It does this by adding a constraint that the result is
NULL if an exception was thrown from operator new, since the result from
operator new is lost when following that exception-handling CFG edge.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as r11-3090-g1690a839cff2e0276017a013419d81d675bbf69d.

gcc/analyzer/ChangeLog:
PR analyzer/94355
* analyzer.opt (Wanalyzer-mismatching-deallocation): New warning.
* region-model-impl-calls.cc
(region_model::impl_call_operator_new): New.
(region_model::impl_call_operator_delete): New.
* region-model.cc (region_model::on_call_pre): Detect operator new
and operator delete.
(region_model::on_call_post): Likewise.
(region_model::maybe_update_for_edge): Detect EH edges and call...
(region_model::apply_constraints_for_exception): New function.
* region-model.h (region_model::impl_call_operator_new): New decl.
(region_model::impl_call_operator_delete): New decl.
(region_model::apply_constraints_for_exception): New decl.
* sm-malloc.cc (enum resource_state): New.
(struct allocation_state): New state subclass.
(enum wording): New.
(struct api): New.
(malloc_state_machine::custom_data_t): New typedef.
(malloc_state_machine::add_state): New decl.
(malloc_state_machine::m_unchecked)
(malloc_state_machine::m_nonnull)
(malloc_state_machine::m_freed): Delete these states in favor
of...
(malloc_state_machine::m_malloc)
(malloc_state_machine::m_scalar_new)
(malloc_state_machine::m_vector_new): ...this new api instances,
which own their own versions of these states.
(malloc_state_machine::on_allocator_call): New decl.
(malloc_state_machine::on_deallocator_call): New decl.
(api::api): New ctor.
(dyn_cast_allocation_state): New.
(as_a_allocation_state): New.
(get_rs): New.
(unchecked_p): New.
(nonnull_p): New.
(freed_p): New.
(malloc_diagnostic::describe_state_change): Use unchecked_p and
nonnull_p.
(class mismatching_deallocation): New.
(double_free::double_free): Add funcname param for initializing
m_funcname.
(double_free::emit): Use m_funcname in warning message rather
than hardcoding "free".
(double_free::describe_state_change): Likewise.  Use freed_p.
(double_free::describe_call_with_state): Use freed_p.
(double_free::describe_final_event): Use m_funcname in message
rather than hardcoding "free".
(double_free::m_funcname): New field.
(possible_null::describe_state_change): Use unchecked_p.
(possible_null::describe_return_of_state): Likewise.
(use_after_free::use_after_free): Add param for initializing m_api.
(use_after_free::emit): Use m_api->m_dealloc_funcname in message
rather than hardcoding "free".
(use_after_free::describe_state_change): Use freed_p.  Change the
wording of the message based on the API.
(use_after_free::describe_final_event): Use
m_api->m_dealloc_funcname in message rather than hardcoding
"free".  Change the wording of the message based on the API.
(use_after_free::m_api): New field.
(malloc_leak::describe_state_change): Use unchecked_p.  Update
for renaming of m_malloc_event to m_alloc_event.
(malloc_leak::describe_final_event): Update for renaming of
m_malloc_event to m_alloc_event.
(malloc_leak::m_malloc_event): Rename...
(mal

Re: [PATCH v3] c++: Further tweaks for new-expression and paren-init [PR77841]

On Wed, Sep 09, 2020 at 05:02:24PM -0400, Jason Merrill wrote:
> On 9/8/20 10:34 PM, Marek Polacek wrote:
> > On Tue, Sep 08, 2020 at 04:19:42PM -0400, Jason Merrill wrote:
> > > On 9/8/20 4:06 PM, Marek Polacek wrote:
> > > > On Mon, Sep 07, 2020 at 11:19:47PM -0400, Jason Merrill wrote:
> > > > > On 9/6/20 11:34 AM, Marek Polacek wrote:
> > > > > > @@ -3944,9 +3935,9 @@ build_new (location_t loc, vec 
> > > > > > **placement, tree type,
> > > > > > }
> > > > > >   /* P1009: Array size deduction in new-expressions.  */
> > > > > > -  if (TREE_CODE (type) == ARRAY_TYPE
> > > > > > -  && !TYPE_DOMAIN (type)
> > > > > > -  && *init)
> > > > > > +  const bool deduce_array_p = (TREE_CODE (type) == ARRAY_TYPE
> > > > > > +  && !TYPE_DOMAIN (type));
> > > > > > +  if (*init && (deduce_array_p || (nelts && cxx_dialect >= cxx20)))
> > > > > 
> > > > > Looks like this won't handle new (char[4]), for which we also get an
> > > > > ARRAY_TYPE.
> > > > 
> > > > Good catch.  Fixed & paren-init37.C added.
> > > > 
> > > > > > {
> > > > > >   /* This means we have 'new T[]()'.  */
> > > > > >   if ((*init)->is_empty ())
> > > > > > @@ -3955,16 +3946,20 @@ build_new (location_t loc, vec 
> > > > > > **placement, tree type,
> > > > > >   CONSTRUCTOR_IS_DIRECT_INIT (ctor) = true;
> > > > > >   vec_safe_push (*init, ctor);
> > > > > > }
> > > > > > +  tree array_type = deduce_array_p ? TREE_TYPE (type) : type;
> > > > > 
> > > > > I'd call this variable elt_type.
> > > > 
> > > > Right, and it should be inside the block below.
> > > > 
> > > > > >   tree &elt = (**init)[0];
> > > > > >   /* The C++20 'new T[](e_0, ..., e_k)' case allowed by 
> > > > > > P0960.  */
> > > > > >   if (!DIRECT_LIST_INIT_P (elt) && cxx_dialect >= cxx20)
> > > > > > {
> > > > > > - /* Handle new char[]("foo").  */
> > > > > > + /* Handle new char[]("foo"): turn it into new char[]{"foo"}.  
> > > > > > */
> > > > > >   if (vec_safe_length (*init) == 1
> > > > > > - && char_type_p (TYPE_MAIN_VARIANT (TREE_TYPE (type)))
> > > > > > + && char_type_p (TYPE_MAIN_VARIANT (array_type))
> > > > > >   && TREE_CODE (tree_strip_any_location_wrapper 
> > > > > > (elt))
> > > > > >  == STRING_CST)
> > > > > > -   /* Leave it alone: the string should not be wrapped in {}.  
> > > > > > */;
> > > > > > +   {
> > > > > > + elt = build_constructor_single (init_list_type_node, 
> > > > > > NULL_TREE, elt);
> > > > > > + CONSTRUCTOR_IS_DIRECT_INIT (elt) = true;
> > > > > > +   }
> > > > > >   else
> > > > > > {
> > > > > >   tree ctor = build_constructor_from_vec 
> > > > > > (init_list_type_node, *init);
> > > > > 
> > > > > With this change, doesn't the string special case produce the same 
> > > > > result as
> > > > > the general case?
> > > > 
> > > > The problem is that reshape_init won't do anything for 
> > > > CONSTRUCTOR_IS_PAREN_INIT.
> > > 
> > > Ah, yes, that flag is the difference.
> > > 
> > > > So the reshape_init in build_new_1 wouldn't unwrap the outermost { } 
> > > > around
> > > > a STRING_CST.
> > > 
> > > > Perhaps reshape_init should be adjusted to do that unwrapping even when 
> > > > it gets
> > > > a CONSTRUCTOR_IS_PAREN_INIT CONSTRUCTOR.  But I'm not sure if it should 
> > > > also do
> > > > the reference_related_p unwrapping in reshape_init_r in that case.
> > > 
> > > That would make sense to me.
> > 
> > Done (but only for the outermost CONSTRUCTOR) in the below.  It allowed me 
> > to...
> > 
> > > > > > @@ -3977,9 +3972,15 @@ build_new (location_t loc, vec 
> > > > > > **placement, tree type,
> > > > > > }
> > > > > > }
> > > > > >   /* Otherwise we should have 'new T[]{e_0, ..., e_k}'.  */
> > > > > > -  if (BRACE_ENCLOSED_INITIALIZER_P (elt))
> > > > > > -   elt = reshape_init (type, elt, complain);
> > > > > > -  cp_complete_array_type (&type, elt, /*do_default*/false);
> > > > > > +  if (deduce_array_p)
> > > > > > +   {
> > > > > > + /* Don't reshape ELT itself: we want to pass a 
> > > > > > list-initializer to
> > > > > > +build_new_1, even for STRING_CSTs.  */
> > > > > > + tree e = elt;
> > > > > > + if (BRACE_ENCLOSED_INITIALIZER_P (e))
> > > > > > +   e = reshape_init (type, e, complain);
> > > > > 
> > > > > The comment is unclear; this call does reshape the CONSTRUCTOR ELT 
> > > > > points
> > > > > to, it just doesn't change ELT if the reshape call returns something 
> > > > > else.
> > > > 
> > > > Yea, I've amended the comment.
> > > > 
> > > > > Why are we reshaping here, anyway?  Won't that lead to undesired brace
> > > > > elision?
> > > > 
> > > > We have to reshape before deducing the array, otherwise we could deduce 
> > > > the
> > > > wrong number of eleme

[PING 2][PATCH 2/5] C front end support to detect out-of-bounds accesses to array parameters


Joseph, do you have any concerns with or comments on the most
recent patch or is it okay as is?

https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552266.html

Martin

On 9/2/20 6:03 PM, Martin Sebor wrote:

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552266.html

On 8/25/20 12:44 PM, Martin Sebor wrote:

Joseph, do you have any more comments on the rest of the most recent
revision of the patch?

https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552266.html

Martin

On 8/19/20 6:09 PM, Joseph Myers wrote:

On Wed, 19 Aug 2020, Martin Sebor via Gcc-patches wrote:

I think you need a while loop there, not just an if, to account for 
the
case of multiple consecutive cdk_attrs.  At least the GNU attribute 
syntax


 direct-declarator:
[...]
   ( gnu-attributes[opt] declarator )

should produce multiple consecutive cdk_attrs for each level of
parentheses with attributes inside.


I had considered a loop but couldn't find a way to trigger what you
describe (or a test in the testsuite that would do it) so I didn't
use one.  I saw loops like that in other places but I couldn't get
even those to uncover such a test case.  Here's what I tried:

   #define A(N) __attribute__ ((aligned (N), may_alias))
   int n;
   void f (int (* A (2) A (4) (* A (2) A (4) (* A (2) A (4) [n])[n])));

Sequences of consecutive attributes are all chained together.

I've added the loop here but I have no test for it.  It would be
good to add one if it really is needed.


The sort of thing I'm thinking of would be, where A is some attribute:

void f (int (A (A (A arg;

(that example doesn't involve an array, but it illustrates the syntax 
I'd

expect to produce multiple consecutive cdk_attrs).

Re: [PATCH] rs6000: Fix instruction type

Hi!

On Wed, Sep 09, 2020 at 04:14:37PM -0500, Pat Haugen wrote:
> I noticed that some of the VSR<->GPR move instructions are not typed
> correctly. This patch fixes those instructions so that the scheduler
> treats them with the correct latency.

> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -5483,7 +5483,7 @@ (define_insn "lfiwzx"
> lxsiwzx %x0,%y1
> mtvsrwz %x0,%1
> xxextractuw %x0,%x1,4"
> -  [(set_attr "type" "fpload,fpload,mftgpr,vecexts")
> +  [(set_attr "type" "fpload,fpload,mffgpr,vecexts")
> (set_attr "isa" "*,p8v,p8v,p9v")])

Can we rename mftgpr/mffgpr globally?  Maybe even as mfvsr and mtvsr,
because that is what is actually modeled here?  Such names will make it
much harder to get confused and use the wrong type, too :-)

> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 54da54c43dc..3a5cf896da8 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -2885,7 +2885,7 @@ (define_insn "vsx_concat_"
>else
>  gcc_unreachable ();
>  }
> -  [(set_attr "type" "vecperm")])
> +  [(set_attr "type" "vecperm,vecmove")])

mtvsrdd is a mtvsr, sorry, mffgpr just the same?  It isn't vecmove?

> @@ -4440,7 +4440,7 @@ (define_insn "vsx_splat__reg"
>"@
> xxpermdi %x0,%x1,%x1,0
> mtvsrdd %x0,%1,%1"
> -  [(set_attr "type" "vecperm")])
> +  [(set_attr "type" "vecperm,vecmove")])

Same here.


Segher

[PING][PATCH] use get_size_range to get allocated size (PR 92942)


Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552903.html

On 8/28/20 11:12 AM, Martin Sebor wrote:

The gimple_call_alloc_size() function that determines the range
of sizes of allocated objects and constrains the bounds in calls
to functions like memcpy calls get_range() instead of
get_size_range() to obtain its result.  The latter is the right
function to call because it has the necessary logic to constrain
the range to just the values that are valid for object sizes.
This is especially useful when the range is the result of
a conversion from a signed to a wider unsigned integer where
the upper subrange is excessive and can be eliminated such as in:

   char* f (int n)
   {
     if (n > 8)
   n = 8;
     char *p = malloc (n);
     strcpy (p, "0123456789");   // buffer overflow
     ...
   }

Attached is a fix that lets -Wstringop-overflow diagnose the buffer
overflow above.  Besides with GCC I have also tested the change by
building Binutils/GDB and Glibc and verifying that it doesn't
introduce any false positives.

Martin

Re: [PATCH v3] c++: Further tweaks for new-expression and paren-init [PR77841]


On 9/9/20 5:35 PM, Marek Polacek wrote:

On Wed, Sep 09, 2020 at 05:02:24PM -0400, Jason Merrill wrote:

On 9/8/20 10:34 PM, Marek Polacek wrote:

On Tue, Sep 08, 2020 at 04:19:42PM -0400, Jason Merrill wrote:

On 9/8/20 4:06 PM, Marek Polacek wrote:

On Mon, Sep 07, 2020 at 11:19:47PM -0400, Jason Merrill wrote:

On 9/6/20 11:34 AM, Marek Polacek wrote:

@@ -3944,9 +3935,9 @@ build_new (location_t loc, vec **placement, 
tree type,
 }
   /* P1009: Array size deduction in new-expressions.  */
-  if (TREE_CODE (type) == ARRAY_TYPE
-  && !TYPE_DOMAIN (type)
-  && *init)
+  const bool deduce_array_p = (TREE_CODE (type) == ARRAY_TYPE
+  && !TYPE_DOMAIN (type));
+  if (*init && (deduce_array_p || (nelts && cxx_dialect >= cxx20)))


Looks like this won't handle new (char[4]), for which we also get an
ARRAY_TYPE.


Good catch.  Fixed & paren-init37.C added.


 {
   /* This means we have 'new T[]()'.  */
   if ((*init)->is_empty ())
@@ -3955,16 +3946,20 @@ build_new (location_t loc, vec 
**placement, tree type,
  CONSTRUCTOR_IS_DIRECT_INIT (ctor) = true;
  vec_safe_push (*init, ctor);
}
+  tree array_type = deduce_array_p ? TREE_TYPE (type) : type;


I'd call this variable elt_type.


Right, and it should be inside the block below.


   tree &elt = (**init)[0];
   /* The C++20 'new T[](e_0, ..., e_k)' case allowed by P0960.  */
   if (!DIRECT_LIST_INIT_P (elt) && cxx_dialect >= cxx20)
{
- /* Handle new char[]("foo").  */
+ /* Handle new char[]("foo"): turn it into new char[]{"foo"}.  */
  if (vec_safe_length (*init) == 1
- && char_type_p (TYPE_MAIN_VARIANT (TREE_TYPE (type)))
+ && char_type_p (TYPE_MAIN_VARIANT (array_type))
  && TREE_CODE (tree_strip_any_location_wrapper (elt))
 == STRING_CST)
-   /* Leave it alone: the string should not be wrapped in {}.  */;
+   {
+ elt = build_constructor_single (init_list_type_node, NULL_TREE, 
elt);
+ CONSTRUCTOR_IS_DIRECT_INIT (elt) = true;
+   }
  else
{
  tree ctor = build_constructor_from_vec (init_list_type_node, 
*init);


With this change, doesn't the string special case produce the same result as
the general case?


The problem is that reshape_init won't do anything for 
CONSTRUCTOR_IS_PAREN_INIT.


Ah, yes, that flag is the difference.


So the reshape_init in build_new_1 wouldn't unwrap the outermost { } around
a STRING_CST.



Perhaps reshape_init should be adjusted to do that unwrapping even when it gets
a CONSTRUCTOR_IS_PAREN_INIT CONSTRUCTOR.  But I'm not sure if it should also do
the reference_related_p unwrapping in reshape_init_r in that case.


That would make sense to me.


Done (but only for the outermost CONSTRUCTOR) in the below.  It allowed me to...


@@ -3977,9 +3972,15 @@ build_new (location_t loc, vec **placement, 
tree type,
}
}
   /* Otherwise we should have 'new T[]{e_0, ..., e_k}'.  */
-  if (BRACE_ENCLOSED_INITIALIZER_P (elt))
-   elt = reshape_init (type, elt, complain);
-  cp_complete_array_type (&type, elt, /*do_default*/false);
+  if (deduce_array_p)
+   {
+ /* Don't reshape ELT itself: we want to pass a list-initializer to
+build_new_1, even for STRING_CSTs.  */
+ tree e = elt;
+ if (BRACE_ENCLOSED_INITIALIZER_P (e))
+   e = reshape_init (type, e, complain);


The comment is unclear; this call does reshape the CONSTRUCTOR ELT points
to, it just doesn't change ELT if the reshape call returns something else.


Yea, I've amended the comment.


Why are we reshaping here, anyway?  Won't that lead to undesired brace
elision?


We have to reshape before deducing the array, otherwise we could deduce the
wrong number of elements when certain braces were omitted.  E.g. in

 struct S { int x, y; };
 new S[]{1, 2, 3, 4}; // braces elided, is { {1, 2}, {3, 4} }


Ah, right, we also get here for initializers written with actual braces.


we want S[2], not S[4].  A way to test it would be

 struct S { int x, y; };
 S *p = new S[]{1, 2, 3, 4};

 void* operator new (unsigned long int size)
 {
 if (size != sizeof (S) * 2)
__builtin_abort ();
 return __builtin_malloc (size);
 }

 int main () { }

I can add that too, if you want.  (It'd be safer if cp_complete_array_type
always reshaped but that's not trivial, as the original patch mentions.)
()-init-list wouldn't be reshaped because CONSTRUCTOR_IS_PAREN_INIT is set.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

Thanks,

-- >8 --
This patch corrects our handling of array new-expression with ()-init:

 new int[4](1, 2, 3, 4);

should work even with the explicit array bound, and

 new char[3]("so_sad");

should cause an error, but we weren't giving any.

Fixe

[PING][PATCH] improve validation of attribute arguments (PR c/78666)


Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552500.html

Aldy provided a bunch of comments on this patch but I'm still looking
for a formal approval.

Martin

On 8/24/20 10:45 AM, Martin Sebor wrote:

On 8/24/20 4:59 AM, Aldy Hernandez wrote:



On 8/21/20 1:37 AM, Martin Sebor wrote:

On 8/20/20 3:00 PM, Aldy Hernandez wrote:



Regardless, here are some random comments.


Thanks for the careful review!


diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 37214831538..bc4f409e346 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -720,6 +725,124 @@ positional_argument (const_tree fntype, 
const_tree atname, tree pos,

   return pos;
 }

+/* Given a pair of NODEs for arbitrary DECLs or TYPEs, validate 
one or
+   two integral or string attribute arguments NEWARGS to be 
applied to
+   NODE[0] for the absence of conflicts with the same attribute 
arguments
+   already applied to NODE[1]. Issue a warning for conflicts and 
return

+   false.  Otherwise, when no conflicts are found, return true.  */
+
+static bool
+validate_attr_args (tree node[2], tree name, tree newargs[2])


I think you're doing too much work in one function.  Also, I 
*really* dislike sending pairs of objects in arrays, especially when 
they're called something so abstract as "node" and "newargs".


Would it be possible to make this function only validate one single 
argument and call it twice?  Or do we gain something by having it do 
two things at once?


I agree about the name "node."  The argument comes from the attribute
handlers: they all take something called a node as their first argument.
It's an array of three elements:
   [0] the current declaration or type
   [1] the previous declaration or type or null
   [2] the current declaration if [0] is a type


Ah, the rest of the functions are taking a node pointer.  Your patch 
threw me off because you use a node[2] instead of a node pointer like 
the rest of the functions.  Perhaps you should keep to the current 
style and pass a node *.


It takes tree node[3] and the -Warray-parameter option (being
reviewed) uses the bound to check for out-of-bounds accesses, both
callers and the callee itself.  (C, but not C++, has special syntax
for this: tree node[static 3].)




validate_attr_args() is called with the same node as the handlers
and uses both node[0] and node[1] to recursively validate the first
one against itself and then against the second.  It could be changed
to take two arguments instead of an array (the new "node" and
the original "node," perhaps even under some better name).  That
would make it different from every handler but maybe that wouldn't
be a problem.

The newargs argument is also an array, with the second element
being optional.  Both elements are used and validated against
the attribute arguments on the same declaration first and again
on the previous one.  The array could be split up into two
distinct arguments, say newarg1 and newarg2, or passed in as
std::pair.  I'm not sure I see much of a difference
between the approaches.


It looks like node[] carries all the information for the current 
attribute and arguments, as well the same information for the previous 
attribute.  Could your validate function just take:


validate_attr_args (tree *node, tree name)

That way you can save passing a pair of arguments, plus you can save 
accumulating said arguments in the handle_* functions.


Or is there something I'm missing here that makes this unfeasible?


If the function didn't the newargs array it would have to extract
the argument(s) from node, duplicating the work already done in
the callers.  I.e., figuring out how many arguments the attribute
expects (one or two, depending on the specific attribute), and for
handle_alloc_size_attribute, calling positional_argument (or at
a minimum default_conversion) to convert it to the expected value.
So it's feasible but doesn't seem like a good design.



  /* Extract the same attribute from the previous declaration or 
type.  */

  tree prevattr = NULL_TREE;
  if (DECL_P (node[1]))
    {
  prevattr = DECL_ATTRIBUTES (node[1]);
  if (!prevattr)
{
  tree type = TREE_TYPE (node[1]);
  prevattr = TYPE_ATTRIBUTES (type);
}
    }
  else if (TYPE_P (node[1]))
    prevattr = TYPE_ATTRIBUTES (node[1]);


If all this section does is extract the attribute from a decl, it 
would look cleaner if you abstracted out this functionality into its 
own function.  I'm a big fan of one function to do one thing.



  const char* const namestr = IDENTIFIER_POINTER (name);
  prevattr = lookup_attribute (namestr, prevattr);
  if (!prevattr)
    return true;


Perhaps a better name would be attribute_name_str?


Thanks for the suggestion but I think NAMESTR is good enough: it
should be clear enough from the function argument NAME that it
refers to the string representation of the NAME.  There also is
already a pre-existing use of NAMESTR elsewhere and so a preceden

Re: [PATCH v3] doc: change 'make check-g++' to 'make check-c++' in install.texi


On 9/9/20 6:25 AM, Hu Jiangping wrote:

This patch check the command 'make check-g++' to 'make check-c++' in
install.texi since there is no 'make check-g++' target in the object
directory.


make check-g++ works fine for me in the object directory.  And 
gcc/cp/Make-lang.in includes



# 'make check' in gcc/ looks for check-c++, as do all toplevel C++-related
# check targets.  However, our DejaGNU framework requires 'check-g++' as its
# entry point.  We feed the former to the latter here.
check-c++ : check-g++


So this change doesn't seem like an improvement.

Jason

Re: [PATCH] rs6000: Fix instruction type

2020-09-09 Thread Pat Haugen via Gcc-patches

On 9/9/20 4:41 PM, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Sep 09, 2020 at 04:14:37PM -0500, Pat Haugen wrote:
>> I noticed that some of the VSR<->GPR move instructions are not typed
>> correctly. This patch fixes those instructions so that the scheduler
>> treats them with the correct latency.
> 
>> --- a/gcc/config/rs6000/rs6000.md
>> +++ b/gcc/config/rs6000/rs6000.md
>> @@ -5483,7 +5483,7 @@ (define_insn "lfiwzx"
>> lxsiwzx %x0,%y1
>> mtvsrwz %x0,%1
>> xxextractuw %x0,%x1,4"
>> -  [(set_attr "type" "fpload,fpload,mftgpr,vecexts")
>> +  [(set_attr "type" "fpload,fpload,mffgpr,vecexts")
>> (set_attr "isa" "*,p8v,p8v,p9v")])
> 
> Can we rename mftgpr/mffgpr globally?  Maybe even as mfvsr and mtvsr,
> because that is what is actually modeled here?  Such names will make it
> much harder to get confused and use the wrong type, too :-)
> 

Those types were originally created for the mffgpr/mftgpr Power6
instructions. But since it appears we no longer generate those insns I
totally agree with doing a global change as you suggest to make things
clearer. Would you like that as a separate patch or is it fine to
include in this one?


>> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
>> index 54da54c43dc..3a5cf896da8 100644
>> --- a/gcc/config/rs6000/vsx.md
>> +++ b/gcc/config/rs6000/vsx.md
>> @@ -2885,7 +2885,7 @@ (define_insn "vsx_concat_"
>>else
>>  gcc_unreachable ();
>>  }
>> -  [(set_attr "type" "vecperm")])
>> +  [(set_attr "type" "vecperm,vecmove")])
> 
> mtvsrdd is a mtvsr, sorry, mffgpr just the same?  It isn't vecmove?
> 
>> @@ -4440,7 +4440,7 @@ (define_insn "vsx_splat__reg"
>>"@
>> xxpermdi %x0,%x1,%x1,0
>> mtvsrdd %x0,%1,%1"
>> -  [(set_attr "type" "vecperm")])
>> +  [(set_attr "type" "vecperm,vecmove")])
> 
> Same here.

mtvsrdd dispatches as a vector op, so requires a super-slice. As opposed
to the others which just require a single execution slice for Power9.

-Pat

Re: [PATCH] rs6000: Fix instruction type

On Wed, Sep 09, 2020 at 05:30:33PM -0500, Pat Haugen wrote:
> On 9/9/20 4:41 PM, Segher Boessenkool wrote:
> > On Wed, Sep 09, 2020 at 04:14:37PM -0500, Pat Haugen wrote:
> > Can we rename mftgpr/mffgpr globally?  Maybe even as mfvsr and mtvsr,
> > because that is what is actually modeled here?  Such names will make it
> > much harder to get confused and use the wrong type, too :-)
> 
> Those types were originally created for the mffgpr/mftgpr Power6
> instructions. But since it appears we no longer generate those insns I

Yes, I know ;-)

> totally agree with doing a global change as you suggest to make things
> clearer. Would you like that as a separate patch or is it fine to
> include in this one?

That will be pretty big and mechanic, so separate please.  Either before
or after this one.

power6.md still uses this attribute to describe the p6-specific insns
scheduling.  Not sure what to do with that?  Remove it, or if we leave
it, add a comment?

> >> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> >> index 54da54c43dc..3a5cf896da8 100644
> >> --- a/gcc/config/rs6000/vsx.md
> >> +++ b/gcc/config/rs6000/vsx.md
> >> @@ -2885,7 +2885,7 @@ (define_insn "vsx_concat_"
> >>else
> >>  gcc_unreachable ();
> >>  }
> >> -  [(set_attr "type" "vecperm")])
> >> +  [(set_attr "type" "vecperm,vecmove")])
> > 
> > mtvsrdd is a mtvsr, sorry, mffgpr just the same?  It isn't vecmove?
> > 
> >> @@ -4440,7 +4440,7 @@ (define_insn "vsx_splat__reg"
> >>"@
> >> xxpermdi %x0,%x1,%x1,0
> >> mtvsrdd %x0,%1,%1"
> >> -  [(set_attr "type" "vecperm")])
> >> +  [(set_attr "type" "vecperm,vecmove")])
> > 
> > Same here.
> 
> mtvsrdd dispatches as a vector op, so requires a super-slice. As opposed
> to the others which just require a single execution slice for Power9.

Ah in that sense.  Okay for trunk then (and backports if we want those).
Thanks!


Segher

Re: [PATCH] libphobos: libdruntime doesn't support shadow stack (PR95680)

2020-09-09 Thread Iain Buclaw via Gcc-patches

Excerpts from Rainer Orth's message of September 8, 2020 11:34 pm:
> Hi Iain,
> 
 ---
 libphobos/ChangeLog:

 PR d/95680
 * Makefile.in: Regenerate.
 * configure: Regenerate.
 * configure.ac (DCFG_ENABLE_CET): Substitute.
 * libdruntime/Makefile.in: Regenerate.
 * libdruntime/config/x86/switchcontext.S: Remove CET support code.
 * libdruntime/core/thread.d: Import gcc.config.  Don't set version
 AsmExternal when GNU_Enable_CET is true.
 * libdruntime/gcc/config.d.in (GNU_Enable_CET): Define.
 * src/Makefile.in: Regenerate.
 * testsuite/Makefile.in: Regenerate.
>>> 
>>> Looks good.  I can try it on Tiger Lake after it has been checked in.
>>> 
>>
>> OK, I have committed it as r11-3047.
> 
> this patch broke Solaris/x86 bootstrap:
> 
> /vol/gcc/src/hg/master/local/libphobos/libdruntime/core/thread.d:3595:23: 
> error: version AsmExternal defined after use
>  3595 | version = AsmExternal;
>   |   ^
> /vol/gcc/src/hg/master/local/libphobos/libdruntime/core/thread.d:3603:27: 
> error: version AsmX86_Posix defined after use
>  3603 | version = AsmX86_Posix;
>   |   ^
> 
> and similarly for the 64-bit version.  libdruntime/gcc/config.d has
> 
> // Whether libphobos been configured with --enable-cet.
> enum GNU_Enable_CET = false;
> 
>   Rainer
> 

Looks like I can only use version conditions, or static if conditions.
Not both at the same time.  Found a related bug in upstream dmd
https://issues.dlang.org/show_bug.cgi?id=7386

Fixing the front-end here will not be possible without some pervasive
changes in how symbol resolving is handled.  Which is a shame.

I'm just testing passing -fversion=CET during compilation.

Iain.

---
diff --git a/libphobos/Makefile.am b/libphobos/Makefile.am
index 84d80016025..874b3a25d02 100644
--- a/libphobos/Makefile.am
+++ b/libphobos/Makefile.am
@@ -33,14 +33,14 @@ AM_MAKEFLAGS = \
"AR_FLAGS=$(AR_FLAGS)" \
"CC_FOR_BUILD=$(CC_FOR_BUILD)" \
"CC_FOR_TARGET=$(CC_FOR_TARGET)" \
-   "CCASFLAGS=$(CCASFLAGS) $(CET_FLAGS)" \
-   "CFLAGS=$(CFLAGS) $(CET_FLAGS)" \
-   "CXXFLAGS=$(CXXFLAGS) $(CET_FLAGS)" \
+   "CCASFLAGS=$(CCASFLAGS)" \
+   "CFLAGS=$(CFLAGS)" \
+   "CXXFLAGS=$(CXXFLAGS)" \
"CFLAGS_FOR_BUILD=$(CFLAGS_FOR_BUILD)" \
-   "CFLAGS_FOR_TARGET=$(CFLAGS_FOR_TARGET) $(CET_FLAGS)" \
+   "CFLAGS_FOR_TARGET=$(CFLAGS_FOR_TARGET)" \
"GDC_FOR_TARGET=$(GDC_FOR_TARGET)" \
"GDC=$(GDC)" \
-   "GDCFLAGS=$(GDCFLAGS) $(CET_FLAGS)" \
+   "GDCFLAGS=$(GDCFLAGS)" \
"INSTALL=$(INSTALL)" \
"INSTALL_DATA=$(INSTALL_DATA)" \
"INSTALL_PROGRAM=$(INSTALL_PROGRAM)" \
diff --git a/libphobos/Makefile.in b/libphobos/Makefile.in
index f6cba17159f..f692b2f719e 100644
--- a/libphobos/Makefile.in
+++ b/libphobos/Makefile.in
@@ -207,6 +207,7 @@ CC = @CC@
 CCAS = @CCAS@
 CCASFLAGS = @CCASFLAGS@
 CC_FOR_BUILD = @CC_FOR_BUILD@
+CET_DFLAGS = @CET_DFLAGS@
 CET_FLAGS = @CET_FLAGS@
 CFLAGS = @CFLAGS@
 CFLAGS_FOR_BUILD = @CFLAGS_FOR_BUILD@
@@ -216,7 +217,6 @@ CPPFLAGS = @CPPFLAGS@
 CYGPATH_W = @CYGPATH_W@
 DCFG_ARM_EABI_UNWINDER = @DCFG_ARM_EABI_UNWINDER@
 DCFG_DLPI_TLS_MODID = @DCFG_DLPI_TLS_MODID@
-DCFG_ENABLE_CET = @DCFG_ENABLE_CET@
 DCFG_HAVE_64BIT_ATOMICS = @DCFG_HAVE_64BIT_ATOMICS@
 DCFG_HAVE_ATOMIC_BUILTINS = @DCFG_HAVE_ATOMIC_BUILTINS@
 DCFG_HAVE_LIBATOMIC = @DCFG_HAVE_LIBATOMIC@
@@ -355,14 +355,14 @@ AM_MAKEFLAGS = \
"AR_FLAGS=$(AR_FLAGS)" \
"CC_FOR_BUILD=$(CC_FOR_BUILD)" \
"CC_FOR_TARGET=$(CC_FOR_TARGET)" \
-   "CCASFLAGS=$(CCASFLAGS) $(CET_FLAGS)" \
-   "CFLAGS=$(CFLAGS) $(CET_FLAGS)" \
-   "CXXFLAGS=$(CXXFLAGS) $(CET_FLAGS)" \
+   "CCASFLAGS=$(CCASFLAGS)" \
+   "CFLAGS=$(CFLAGS)" \
+   "CXXFLAGS=$(CXXFLAGS)" \
"CFLAGS_FOR_BUILD=$(CFLAGS_FOR_BUILD)" \
-   "CFLAGS_FOR_TARGET=$(CFLAGS_FOR_TARGET) $(CET_FLAGS)" \
+   "CFLAGS_FOR_TARGET=$(CFLAGS_FOR_TARGET)" \
"GDC_FOR_TARGET=$(GDC_FOR_TARGET)" \
"GDC=$(GDC)" \
-   "GDCFLAGS=$(GDCFLAGS) $(CET_FLAGS)" \
+   "GDCFLAGS=$(GDCFLAGS)" \
"INSTALL=$(INSTALL)" \
"INSTALL_DATA=$(INSTALL_DATA)" \
"INSTALL_PROGRAM=$(INSTALL_PROGRAM)" \
diff --git a/libphobos/configure b/libphobos/configure
index 3cccee748e7..05f4d7af0d2 100755
--- a/libphobos/configure
+++ b/libphobos/configure
@@ -722,7 +722,7 @@ LIBTOOL
 CFLAGS_FOR_BUILD
 CC_FOR_BUILD
 AR
-DCFG_ENABLE_CET
+CET_DFLAGS
 CET_FLAGS
 RANLIB
 MAINT
@@ -5651,12 +5651,11 @@ $as_echo "no" >&6; }
 fi
 
 
-if test x$enable_cet = xyes; then :
-  DCFG_ENABLE_CET=true
-else
-  DCFG_ENABLE_CET=false
-fi
+# To ensure that runtime code for CET is compiled in, add in D version flags.
+if test "$enable_cet" = yes; then
+  CET_DFLAGS="$CET_FLAGS -fversion=CET"
 
+fi
 
 # This should be inherited in the recu

[PATCH] correct offset range adjustment in compute_objsize (PR 96903)


In a recent change I added an incorrect adjustment to the conversion
of an offset range with inverted bounds (i.e., upper less than lower)
that caused a regression I overlooked in testing.  Attached is a fix
for this tested on x86_64-linux, and with a more comprehensive test.

I plan to commit this tomorrow unless there are comments/suggestions.

Martin
PR middle-end/96903 - bogus warning on memcpy at negative offset from array end

gcc/ChangeLog:

	PR middle-end/96903
	* builtins.c (compute_objsize): Remove incorrect offset adjustment.
	(compute_objsize): Adjust offset range here instead.

gcc/testsuite/ChangeLog:

	PR middle-end/96903
	* gcc.dg/Wstringop-overflow-42.c:: Add comment.
	* gcc.dg/Wstringop-overflow-43.c: New test.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 97f1a184dc6..8b9a4a4d948 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -4372,12 +4372,6 @@ compute_objsize (tree ptr, int ostype, access_ref *pref,
 	  orng[0] = wi::to_offset (TYPE_MIN_VALUE (ptrdiff_type_node));
 	  orng[1] = wi::to_offset (TYPE_MAX_VALUE (ptrdiff_type_node));
 	}
-	  else if (wi::lts_p (orng[1], orng[0]))
-	/* The upper bound is less than the lower bound when the integer
-	   operand is the result of signed integer conversion to sizetype,
-	   as in P + OFF + CST where OFF > 0.
-	   Correct just the upper bound.  */
-	orng[1] = wi::to_offset (TYPE_MAX_VALUE (ptrdiff_type_node));
 
 	  pref->offrng[0] += orng[0];
 	  pref->offrng[1] += orng[1];
@@ -4403,7 +4397,8 @@ compute_objsize (tree ptr, int ostype, access_ref *pref,
   return false;
 }
 
-/* Convenience wrapper around the above.  */
+/* A "public" wrapper around the above.  Clients should use this overload
+   instead.  */
 
 static tree
 compute_objsize (tree ptr, int ostype, access_ref *pref,
@@ -4420,6 +4415,15 @@ compute_objsize (tree ptr, int ostype, access_ref *pref,
   if (!success)
 return NULL_TREE;
 
+  if (pref->offrng[1] < pref->offrng[0])
+{
+  if (pref->offrng[1] < 0
+	  && pref->sizrng[1] <= pref->offrng[0])
+	return size_zero_node;
+
+  return wide_int_to_tree (sizetype, pref->sizrng[1]);
+}
+
   if (pref->offrng[0] < 0)
 {
   if (pref->offrng[1] < 0)
@@ -4428,7 +4432,7 @@ compute_objsize (tree ptr, int ostype, access_ref *pref,
   pref->offrng[0] = 0;
 }
 
-  if (pref->sizrng[1] < pref->offrng[0])
+  if (pref->sizrng[1] <= pref->offrng[0])
 return size_zero_node;
 
   return wide_int_to_tree (sizetype, pref->sizrng[1] - pref->offrng[0]);
diff --git a/gcc/testsuite/gcc.dg/Wstringop-overflow-42.c b/gcc/testsuite/gcc.dg/Wstringop-overflow-42.c
index 21a675ab7c7..4bb22f2ecd3 100644
--- a/gcc/testsuite/gcc.dg/Wstringop-overflow-42.c
+++ b/gcc/testsuite/gcc.dg/Wstringop-overflow-42.c
@@ -36,7 +36,11 @@ void cpy_sl_1_max (long i, const char *s)
 void cpy_ul_1_max (unsigned long i, const char *s)
 {
   if (i < 1) i = 1;
+
   d = strcpy (a + i, s);  // { dg-warning "writing 1 or more bytes into a region of size 0" }
+
+  /* Because of integer wraparound the offset's range is [1, 0] so
+ the overflow isn't diagnosed (yet).  */
   d = strcpy (a + i + 1, s);  // { dg-warning "writing 1 or more bytes into a region of size 0" "" { xfail *-*-* } }
 }
 
diff --git a/gcc/testsuite/gcc.dg/Wstringop-overflow-43.c b/gcc/testsuite/gcc.dg/Wstringop-overflow-43.c
new file mode 100644
index 000..3ac5a88e4b0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Wstringop-overflow-43.c
@@ -0,0 +1,178 @@
+/* PR 96903 - bogus warning on memcpy at negative offset from array end
+   { dg-do compile }
+   { dg-options "-O2 -Wall -Wno-array-bounds -ftrack-macro-expansion=0" } */
+
+#include "range.h"
+
+#define INT_MAX__INT_MAX__
+#define INT_MIN-(INT_MAX - 1)
+#define UINT_MAX   (2U * INT_MAX + 1)
+
+typedef __SIZE_TYPE__ size_t;
+
+void* memset (void *, int, size_t);
+
+void sink (void*, ...);
+
+extern char a11[11];
+struct S { char a11[11], b; };
+extern struct S sa11;
+
+#define T2(dst, off1, off2, n) do {		\
+char *_p0 = dst;\
+char *_p1 = _p0 + (off1);			\
+char *_p2 = _p1 + (off2);			\
+memset (_p2, 0, n);\
+sink (dst, _p0, _p1, _p2);			\
+  } while (0);
+
+#define T1(dst, off, n) T2 (dst, off, 0, n)
+
+
+void nowarn_memset_array_cst (void)
+{
+  char *p = &a11[11];
+
+  T1 (p, -11, 11);
+  T1 (p, -10, 10);
+  T1 (p,  -9,  9);
+  T1 (p,  -8,  8);
+  T1 (p,  -3,  3);
+  T1 (p,  -2,  2);
+  T1 (p,  -1,  1);
+  T1 (p,   0,  0);
+
+  T2 (p, -6, -5, 11);
+  T2 (p, -6, -4, 10);
+  T2 (p, -6, -3,  9);
+  T2 (p, -6, -2,  8);
+  T2 (p, -6, -1,  7);
+  T2 (p, -5, -6, 11);
+  T2 (p, -5, -5, 10);
+}
+
+void nowarn_memset_array_rng_int (void)
+{
+  char *p = &a11[11];
+
+  int i11 = SR (11, INT_MAX);
+  int i10 = SR (10, INT_MAX);
+  int i9  = SR ( 9, INT_MAX);
+  int i3  = SR ( 3, INT_MAX);
+  int i2  = SR ( 2, INT_MAX);
+  int i1  = SR ( 1, INT_MAX);
+  int i0  = SR ( 0, INT_MAX);
+
+  int m11 = SR (INT_MIN, -11);
+  int m10 = SR (INT_MIN, -10);
+  int m9  =

Re: [PATCH V2 0/4] Unify C and C++ handling of loops and switches