date:20200226

Re: [PATCH] store-merging: Fix coalesce_immediate_stores [PR93820]

2020-02-26 Thread Richard Biener

On Wed, 26 Feb 2020, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase is miscompiled in 8+.
> The problem is that check_no_overlap has a special case for INTEGER_CST
> marked stores (i.e. stores of constants), if both all currenly merged stores
> and the one under consideration for merging with them are marked that way,
> it anticipates that other INTEGER_CST marked stores that overlap with those
> and precede those (have smaller info->order) could be merged with those and
> doesn't punt for them.
> In PR86844 and PR87859 fixes I've then added quite large code that is
> performed after check_no_overlap and tries to find out if we need and can
> merge further INTEGER_CST marked stores, or need to punt.
> Unfortunately, that code is there only in the overlapping case code and
> the testcase below shows that we really need it even in the adjacent store
> case.  After sort_by_bitpos we have:
> bitposwidth   order   rhs_code
> 9632  3   INTEGER_CST
> 128   32  1   INTEGER_CST
> 128   128 2   INTEGER_CST
> 192   32  0   MEM_REF
> Because of the missing PR86844/PR87859-ish code in the adjacent store
> case, we merge the adjacent (memory wise) stores 96/32/3 and 128/32/1,
> and then we consider the 128-bit store which is in program-order in between
> them, but in this case we punt, because the merging would extend the
> merged store region from bitpos 96 and 64-bits to bitpos 96 and 160-bits
> and that has an overlap with an incompatible store (the MEM_REF one).
> The problem is that we can't really punt this way, because the 128-bit
> store is in between those two we've merged already, so either we manage
> to merge even that one together with the others, or would need to avoid
> already merging the 96/32/3 and 128/32/1 stores together.
> Now, rather than copying around the PR86844/PR87859 code to the other spot,
> we can actually just use the overlapping code, merge_overlapping is really
> a superset of merge_into, so that is what the patch does.  If doing
> adjacent store merge for rhs_code other than INTEGER_CST, I believe the
> current code is already fine, check_no_overlap in that case doesn't make
> the exception and will punt if there is some earlier (smaller order)
> non-mergeable overlapping store.  There is just one case that could be
> problematic, if the merged_store has BIT_INSERT_EXPRs in them and the
> new store is a constant store (INTEGER_CST rhs_code), then check_no_overlap
> would do the exception and still would allow the special case.  But we
> really shouldn't have the special case in that case, so this patch also
> changes check_no_overlap to just have a bool whether we should have the
> special case or not.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux on the trunk as well
> as coprresponding backport to 8 branch, ok for trunk and release branches?

OK.

> Note, as I said in the PR, for GCC11 we could consider performing some kind
> of cheap DSE during the store merging (perhaps guarded with flag_tree_dse).
> And another thing to consider is only consider as problematic non-mergeable
> stores that not only have order smaller than last_order as currently, but
> also have order larger than first_order, as in this testcase if we actually
> ignored (not merged with anything at all) the 192/32/0 store, because it is
> not in between the other stores we'd merge, it would be fine to merge the
> other 3 stores, though of course the testcase can be easily adjusted by
> putting the 192/32 store after the 128/32 store and then this patch would be
> still needed.  Though, I think I'd need more time thinking this over.

When looking at the PR I thought that the issue must be with the sorting
since if there's no intermediate store breaking the chain we should be
able to "merge" all INTEGER_CSTs by simply native encoding them in
program order?  (similar to what VN does with partial defs)

Richard.

> 2020-02-25  Jakub Jelinek  
> 
>   PR tree-optimization/93820
>   * gimple-ssa-store-merging.c (check_no_overlap): Change RHS_CODE
>   argument to ALL_INTEGER_CST_P boolean.
>   (imm_store_chain_info::try_coalesce_bswap): Adjust caller.
>   (imm_store_chain_info::coalesce_immediate_stores): Likewise.  Handle
>   adjacent INTEGER_CST store into merged_store->only_constants like
>   overlapping one.
> 
>   * gcc.dg/pr93820.c: New test.
> 
> --- gcc/gimple-ssa-store-merging.c.jj 2020-02-13 10:03:39.0 +0100
> +++ gcc/gimple-ssa-store-merging.c2020-02-25 18:47:40.303092315 +0100
> @@ -2370,8 +2370,9 @@ gather_bswap_load_refs (vec *refs,
>  /* Check if there are any stores in M_STORE_INFO after index I
> (where M_STORE_INFO must be sorted by sort_by_bitpos) that overlap
> a potential group ending with END that have their order
> -   smaller than LAST_ORDER.  RHS_CODE is the kind of store in the
> -   group.  Return true if there are no such stores.
> +   smaller than LAST_ORDER.  ALL_INTEGER_CST_P is

Re: [RFC] Run store-merging pass once more before pass fre/pre

2020-02-26 Thread luoxhu

On 2020/2/18 17:57, Richard Biener wrote:
> On Tue, 18 Feb 2020, Xionghu Luo wrote:
> 
>> Store-merging pass should run twice, the reason is pass fre/pre will do
>> some kind of optimizations to instructions by:
>>1. Converting the load from address to load from function arguments
>>(store_merging_30.c:foo1).
>>2. Converting the byte access to BIT_FIELD_REF(store_merging_30.c:foo2).
>>3. Other bitfield combinations or potential interference optimizations 
>> etc.
>> These optimizations will break the store chain, store-merging pass fails
>> to catch such kind of pattern so stores are not merged in middle end,
>> then consecutive stb/sth instructions(should be merged to stw) are emitted
>> finally.
>>
>> And why not directly move store-merging pass(numbered 194) just before
>> fre1(numbered 35) is for case store_merging_14.c, 5 merges are done by
>> store_merging1, and 4 merges are done fore store_merge2. So, keep the
>> original store_merge as store_merge2 as store merge may be still available
>> after other pass optimizations.  Most of the 30 store_merging_N.c test
>> case dg-final pass name would be updated from store-merging to
>> store-merging1 once this RFC patch idea got confirmed.
>> Any comments?  Thanks.
> 
> Generally trying to solve a pass ordering issue by re-ordering/duplicating
> passes might help a single testcase but will surely pessimize others.
> So that's a no-go.
> 
> What the testcase shows is that store-merging needs to operate
> similar to bswap when trying to find a "common" source for a combined
> store.  That is, it should appropriately follow defs.  For foo2 I expect
> it to be not too difficult, for foo1 I'd say we miss a FRE opportunity
> here (but Jakub is working on that one IIRC).

Thanks Richard, not sure about my understanding and please correct if any.

I tried Jukub's latest patch of "sccvn: Handle bitfields in push_partial_def".
Got to know fre pass checks the load instruction's vuse chain and do the 
constant
bitfield combines in push_partial_def, then 
native_encode_expr/native_interpret_expr
can decode and encode the constant content and shift/combine the data.  
This should be based on one change to my test case(by adding return 
page->counters;)
to trigger the fre pass push all SSA name's partial_defs.  Currently, for SSA 
variables,
this encode/interpret is not supported yet, I suppose this is the opportunity 
you mean.
As this case's input is not constant, so Jukub's patch doesn't fix it.

struct page
{
  union
  {
unsigned counters;
struct
{
  union
  {
struct
{
  unsigned inuse : 16;
  unsigned inuse2 : 8;
  unsigned objects : 5;
  unsigned frozen : 3;
};
  };
};
  };
};

unsigned
foo1 (struct page *page, unsigned long counters_new)
{
struct page tmp;
tmp.counters = counters_new;
page->inuse   = tmp.inuse;
page->inuse2  = tmp.inuse2;
page->objects = tmp.objects;
page->frozen  = tmp.frozen;
return page->counters;
}

If "return page->counters;" is removed, this case won't match the fre's current 
code path
in vn_reference_lookup_3 without a load(_14 = page_9(D)->D.2912.counters) to 
walk all the vuses.
So it seems not a fre opportunity but exactly a store merging issue(Also 
checked llvm that it 
doesn't generate byte store and short store so it only produces 1 stw for all 
patterns).

Optimize this case in pass store-merging is reasonable, just as you said,
"trying to find a "common" source for a combined store.", it requires break
store-merging pass's behavior: so far store-merging pass track all store
instruction's RHS and stops when RHS is a load instruction with  extracted, convert instructions and 
bitfield
instructions generated by pass fre are ignored in 
pass_store_merging::process_store.
To teach the process_store capture real common source requires refactoring 
handled_load
to continue tracking even RHS is a load instruction and support the convert  
instructions
and bitfield_instructions, this seems to be a big change.

Regarding to Jakub's comments, merging could reduce many byte stores and half 
stores
to improve performance for this type of case.  There is already an 033t.esra 
running before,
and not sure whether SRA should replace such kind of bitfield operations.
Adding a store-merging pass is so simple and many passes run more than once...
So which would be best for this optimization, please?  Thanks again :)

Xionghu

> 
> Richard.
> 
>> PS:
>> Before this patch, store_merging_30.c.035t.fre1:
>>
>> ... foo1:
>> Inserted _13 = (short unsigned int) counters_new_5(D);
>> Replaced tmp.D.2912.D.2911.D.2910.D.2909.inuse with _13 in all uses of
>> _1 = tmp.D.2912.D.2911.D.2910.D.2909.inuse;
>> Removing dead stmt _1 = tmp.D.2912.D.2911.D.2910.D.2909.inuse;
>> ... foo2:
>> Inserted _17 = BIT_FIELD_REF <_1, 8, 16>;
>> Replaced tmp.D.2926.D.2925.D.2924.D.2923.objects with _17 in all uses of
>> _3 = tm

Re: GLIBC libmvec status

2020-02-26 Thread Jakub Jelinek

On Tue, Feb 25, 2020 at 07:43:09PM -0600, Bill Schmidt wrote:
> The reason that homogeneous aggregates matter (at least somewhat) is that
> the ABI ^H^H^H^HAPI requires establishing a calling convention and a name-
> mangling formula that includes the length of parameters and return values.
> Since ELFv2 and ELFv1 do not have the same calling convention, and ELFv2
> has a superior one, we chose to use ELFv2's calling convention and make use
> of homogeneous aggregates for return values in registers for the case of
> vectorized sincos.

Can you please explain how do you want to pass the
void sincos (double, double *, double *);
arguments?  I must say it isn't entirely clear from the document.
You talk there about double[2], but sincos certainly doesn't have such an
argument.

Also, I'd say ignoring the masked variants is a mistake, are you going to
warn any time the user uses inbranch or even doesn't specify notinbranch?
The masking can be implemented even without highly specialized instructions,
e.g. on x86 only AVX512F has full masking support, for older ISAs all that
is there is conditional store or e.g. for integral operations that can't
trap/raise exceptions just doing blend-like operations (or even and/or) is
all that is needed; just let the vectorizer do its job.

Even if you don't want it for libmvec, just use
__attribute__((simd ("notinbranch")))
for those, but allow the user to use it where it makes sense.

Jakub

Re: [PATCH] store-merging: Fix coalesce_immediate_stores [PR93820]

2020-02-26 Thread Jakub Jelinek

On Wed, Feb 26, 2020 at 09:09:11AM +0100, Richard Biener wrote:
> > Note, as I said in the PR, for GCC11 we could consider performing some kind
> > of cheap DSE during the store merging (perhaps guarded with flag_tree_dse).
> > And another thing to consider is only consider as problematic non-mergeable
> > stores that not only have order smaller than last_order as currently, but
> > also have order larger than first_order, as in this testcase if we actually
> > ignored (not merged with anything at all) the 192/32/0 store, because it is
> > not in between the other stores we'd merge, it would be fine to merge the
> > other 3 stores, though of course the testcase can be easily adjusted by
> > putting the 192/32 store after the 128/32 store and then this patch would be
> > still needed.  Though, I think I'd need more time thinking this over.
> 
> When looking at the PR I thought that the issue must be with the sorting
> since if there's no intermediate store breaking the chain we should be
> able to "merge" all INTEGER_CSTs by simply native encoding them in
> program order?  (similar to what VN does with partial defs)

They are actually merged in the program order, there is
  stores.qsort (sort_by_order);
at the start of apply_stores, it is just that for the decisions what to
coalesce together it is better to process them in the bitpos order to find
out easily what is adjacent, overlapping etc.

Jakub

Re: [RFC] Run store-merging pass once more before pass fre/pre

2020-02-26 Thread Richard Biener

On Wed, 26 Feb 2020, luoxhu wrote:

> On 2020/2/18 17:57, Richard Biener wrote:
> > On Tue, 18 Feb 2020, Xionghu Luo wrote:
> > 
> >> Store-merging pass should run twice, the reason is pass fre/pre will do
> >> some kind of optimizations to instructions by:
> >>1. Converting the load from address to load from function arguments
> >>(store_merging_30.c:foo1).
> >>2. Converting the byte access to BIT_FIELD_REF(store_merging_30.c:foo2).
> >>3. Other bitfield combinations or potential interference optimizations 
> >> etc.
> >> These optimizations will break the store chain, store-merging pass fails
> >> to catch such kind of pattern so stores are not merged in middle end,
> >> then consecutive stb/sth instructions(should be merged to stw) are emitted
> >> finally.
> >>
> >> And why not directly move store-merging pass(numbered 194) just before
> >> fre1(numbered 35) is for case store_merging_14.c, 5 merges are done by
> >> store_merging1, and 4 merges are done fore store_merge2. So, keep the
> >> original store_merge as store_merge2 as store merge may be still available
> >> after other pass optimizations.  Most of the 30 store_merging_N.c test
> >> case dg-final pass name would be updated from store-merging to
> >> store-merging1 once this RFC patch idea got confirmed.
> >> Any comments?  Thanks.
> > 
> > Generally trying to solve a pass ordering issue by re-ordering/duplicating
> > passes might help a single testcase but will surely pessimize others.
> > So that's a no-go.
> > 
> > What the testcase shows is that store-merging needs to operate
> > similar to bswap when trying to find a "common" source for a combined
> > store.  That is, it should appropriately follow defs.  For foo2 I expect
> > it to be not too difficult, for foo1 I'd say we miss a FRE opportunity
> > here (but Jakub is working on that one IIRC).
> 
> Thanks Richard, not sure about my understanding and please correct if any.
> 
> I tried Jukub's latest patch of "sccvn: Handle bitfields in push_partial_def".
> Got to know fre pass checks the load instruction's vuse chain and do the 
> constant
> bitfield combines in push_partial_def, then 
> native_encode_expr/native_interpret_expr
> can decode and encode the constant content and shift/combine the data.  
> This should be based on one change to my test case(by adding return 
> page->counters;)
> to trigger the fre pass push all SSA name's partial_defs.  Currently, for SSA 
> variables,
> this encode/interpret is not supported yet, I suppose this is the opportunity 
> you mean.
> As this case's input is not constant, so Jukub's patch doesn't fix it.
> 
> struct page
> {
>   union
>   {
> unsigned counters;
> struct
> {
>   union
>   {
> struct
> {
>   unsigned inuse : 16;
>   unsigned inuse2 : 8;
>   unsigned objects : 5;
>   unsigned frozen : 3;
> };
>   };
> };
>   };
> };
> 
> unsigned
> foo1 (struct page *page, unsigned long counters_new)
> {
> struct page tmp;
> tmp.counters = counters_new;
> page->inuse   = tmp.inuse;
> page->inuse2  = tmp.inuse2;
> page->objects = tmp.objects;
> page->frozen  = tmp.frozen;
> return page->counters;
> }
> 
> If "return page->counters;" is removed, this case won't match the fre's 
> current code path
> in vn_reference_lookup_3 without a load(_14 = page_9(D)->D.2912.counters) to 
> walk all the vuses.
> So it seems not a fre opportunity but exactly a store merging issue(Also 
> checked llvm that it 
> doesn't generate byte store and short store so it only produces 1 stw for all 
> patterns).

Huh, but the code has the return page->counters and you expec it to
return counters_new unchanged.  Yes, you also expect the code to be
optimized to

   page->counters = counters_new;
   return counters_new;

and that part would indeed be a store-merging opportunity.  Currently
FRE does

  _1 = (unsigned int) counters_new_6(D);
  tmp.D.1943.counters = _1;
  _18 = (short unsigned int) counters_new_6(D);
  page_9(D)->D.1943.D.1942.D.1941.D.1940.inuse = _18;
  _19 = BIT_FIELD_REF <_1, 8, 16>;
  page_9(D)->D.1943.D.1942.D.1941.D.1940.inuse2 = _19;
  _4 = tmp.D.1943.D.1942.D.1941.D.1940.objects;
  page_9(D)->D.1943.D.1942.D.1941.D.1940.objects = _4;
  _5 = tmp.D.1943.D.1942.D.1941.D.1940.frozen;
  page_9(D)->D.1943.D.1942.D.1941.D.1940.frozen = _5;

so it fails to optimize the other loads to BIT_FIELD_REFs of _1.  If it
did that then store-merging woudl already handle the code I think.

> Optimize this case in pass store-merging is reasonable, just as you said,
> "trying to find a "common" source for a combined store.", it requires break
> store-merging pass's behavior: so far store-merging pass track all store
> instruction's RHS and stops when RHS is a load instruction with  bitsize,
> bitpos, bitregion_start, bitregion_end> extracted, convert instructions and 
> bitfield
> instructions generated by pass fre are ignor

Re: [RFC] Run store-merging pass once more before pass fre/pre

2020-02-26 Thread Jakub Jelinek

On Wed, Feb 26, 2020 at 04:09:16PM +0800, luoxhu wrote:
> Thanks Richard, not sure about my understanding and please correct if any.
> 
> I tried Jukub's latest patch of "sccvn: Handle bitfields in push_partial_def".
> Got to know fre pass checks the load instruction's vuse chain and do the 
> constant
> bitfield combines in push_partial_def, then 
> native_encode_expr/native_interpret_expr
> can decode and encode the constant content and shift/combine the data.  
> This should be based on one change to my test case(by adding return 
> page->counters;)
> to trigger the fre pass push all SSA name's partial_defs.  Currently, for SSA 
> variables,
> this encode/interpret is not supported yet, I suppose this is the opportunity 
> you mean.
> As this case's input is not constant, so Jukub's patch doesn't fix it.

Yeah, I've looked at your case and the problem is that
 tmp.counters = counters_new;
 page->inuse   = tmp.inuse;
 page->inuse2  = tmp.inuse2;
 page->objects = tmp.objects;
 page->frozen  = tmp.frozen;
we optimize it into essentially:
 tmp.counters = counters_new;
 page->inuse   = (cast) counters_new;
 page->inuse2  = tmp.inuse2;
 page->objects = tmp.objects;
 page->frozen  = tmp.frozen;
and so it is no longer recognized as a mem copy.  What we could do is just
teach store-merging to virtually undo that optimization and consider the
store to be a MEM_REF too.  Especially for the BIT_INSERT_EXPR Eric has
added quite a few similar optimizations already.

Jakub

Re: [RFC] Run store-merging pass once more before pass fre/pre

2020-02-26 Thread Richard Biener

On Wed, 26 Feb 2020, Richard Biener wrote:

> On Wed, 26 Feb 2020, luoxhu wrote:
> 
> > On 2020/2/18 17:57, Richard Biener wrote:
> > > On Tue, 18 Feb 2020, Xionghu Luo wrote:
> > > 
> > >> Store-merging pass should run twice, the reason is pass fre/pre will do
> > >> some kind of optimizations to instructions by:
> > >>1. Converting the load from address to load from function arguments
> > >>(store_merging_30.c:foo1).
> > >>2. Converting the byte access to 
> > >> BIT_FIELD_REF(store_merging_30.c:foo2).
> > >>3. Other bitfield combinations or potential interference 
> > >> optimizations etc.
> > >> These optimizations will break the store chain, store-merging pass fails
> > >> to catch such kind of pattern so stores are not merged in middle end,
> > >> then consecutive stb/sth instructions(should be merged to stw) are 
> > >> emitted
> > >> finally.
> > >>
> > >> And why not directly move store-merging pass(numbered 194) just before
> > >> fre1(numbered 35) is for case store_merging_14.c, 5 merges are done by
> > >> store_merging1, and 4 merges are done fore store_merge2. So, keep the
> > >> original store_merge as store_merge2 as store merge may be still 
> > >> available
> > >> after other pass optimizations.  Most of the 30 store_merging_N.c test
> > >> case dg-final pass name would be updated from store-merging to
> > >> store-merging1 once this RFC patch idea got confirmed.
> > >> Any comments?  Thanks.
> > > 
> > > Generally trying to solve a pass ordering issue by re-ordering/duplicating
> > > passes might help a single testcase but will surely pessimize others.
> > > So that's a no-go.
> > > 
> > > What the testcase shows is that store-merging needs to operate
> > > similar to bswap when trying to find a "common" source for a combined
> > > store.  That is, it should appropriately follow defs.  For foo2 I expect
> > > it to be not too difficult, for foo1 I'd say we miss a FRE opportunity
> > > here (but Jakub is working on that one IIRC).
> > 
> > Thanks Richard, not sure about my understanding and please correct if any.
> > 
> > I tried Jukub's latest patch of "sccvn: Handle bitfields in 
> > push_partial_def".
> > Got to know fre pass checks the load instruction's vuse chain and do the 
> > constant
> > bitfield combines in push_partial_def, then 
> > native_encode_expr/native_interpret_expr
> > can decode and encode the constant content and shift/combine the data.  
> > This should be based on one change to my test case(by adding return 
> > page->counters;)
> > to trigger the fre pass push all SSA name's partial_defs.  Currently, for 
> > SSA variables,
> > this encode/interpret is not supported yet, I suppose this is the 
> > opportunity you mean.
> > As this case's input is not constant, so Jukub's patch doesn't fix it.
> > 
> > struct page
> > {
> >   union
> >   {
> > unsigned counters;
> > struct
> > {
> >   union
> >   {
> > struct
> > {
> >   unsigned inuse : 16;
> >   unsigned inuse2 : 8;
> >   unsigned objects : 5;
> >   unsigned frozen : 3;
> > };
> >   };
> > };
> >   };
> > };
> > 
> > unsigned
> > foo1 (struct page *page, unsigned long counters_new)
> > {
> > struct page tmp;
> > tmp.counters = counters_new;
> > page->inuse   = tmp.inuse;
> > page->inuse2  = tmp.inuse2;
> > page->objects = tmp.objects;
> > page->frozen  = tmp.frozen;
> > return page->counters;
> > }
> > 
> > If "return page->counters;" is removed, this case won't match the fre's 
> > current code path
> > in vn_reference_lookup_3 without a load(_14 = page_9(D)->D.2912.counters) 
> > to walk all the vuses.
> > So it seems not a fre opportunity but exactly a store merging issue(Also 
> > checked llvm that it 
> > doesn't generate byte store and short store so it only produces 1 stw for 
> > all patterns).
> 
> Huh, but the code has the return page->counters and you expec it to
> return counters_new unchanged.  Yes, you also expect the code to be
> optimized to
> 
>page->counters = counters_new;
>return counters_new;
> 
> and that part would indeed be a store-merging opportunity.  Currently
> FRE does
> 
>   _1 = (unsigned int) counters_new_6(D);
>   tmp.D.1943.counters = _1;
>   _18 = (short unsigned int) counters_new_6(D);
>   page_9(D)->D.1943.D.1942.D.1941.D.1940.inuse = _18;
>   _19 = BIT_FIELD_REF <_1, 8, 16>;
>   page_9(D)->D.1943.D.1942.D.1941.D.1940.inuse2 = _19;
>   _4 = tmp.D.1943.D.1942.D.1941.D.1940.objects;
>   page_9(D)->D.1943.D.1942.D.1941.D.1940.objects = _4;
>   _5 = tmp.D.1943.D.1942.D.1941.D.1940.frozen;
>   page_9(D)->D.1943.D.1942.D.1941.D.1940.frozen = _5;
> 
> so it fails to optimize the other loads to BIT_FIELD_REFs of _1.  If it
> did that then store-merging woudl already handle the code I think.

FRE fails to do that because of

  /* ???  We can't handle bitfield precision extracts without
 either usi

Re: [RFC] Run store-merging pass once more before pass fre/pre

2020-02-26 Thread Andrew Pinski

On Wed, Feb 26, 2020 at 12:30 AM Richard Biener  wrote:
>
> On Wed, 26 Feb 2020, luoxhu wrote:
>
> > On 2020/2/18 17:57, Richard Biener wrote:
> > > On Tue, 18 Feb 2020, Xionghu Luo wrote:
> > >
> > >> Store-merging pass should run twice, the reason is pass fre/pre will do
> > >> some kind of optimizations to instructions by:
> > >>1. Converting the load from address to load from function arguments
> > >>(store_merging_30.c:foo1).
> > >>2. Converting the byte access to 
> > >> BIT_FIELD_REF(store_merging_30.c:foo2).
> > >>3. Other bitfield combinations or potential interference 
> > >> optimizations etc.
> > >> These optimizations will break the store chain, store-merging pass fails
> > >> to catch such kind of pattern so stores are not merged in middle end,
> > >> then consecutive stb/sth instructions(should be merged to stw) are 
> > >> emitted
> > >> finally.
> > >>
> > >> And why not directly move store-merging pass(numbered 194) just before
> > >> fre1(numbered 35) is for case store_merging_14.c, 5 merges are done by
> > >> store_merging1, and 4 merges are done fore store_merge2. So, keep the
> > >> original store_merge as store_merge2 as store merge may be still 
> > >> available
> > >> after other pass optimizations.  Most of the 30 store_merging_N.c test
> > >> case dg-final pass name would be updated from store-merging to
> > >> store-merging1 once this RFC patch idea got confirmed.
> > >> Any comments?  Thanks.
> > >
> > > Generally trying to solve a pass ordering issue by re-ordering/duplicating
> > > passes might help a single testcase but will surely pessimize others.
> > > So that's a no-go.
> > >
> > > What the testcase shows is that store-merging needs to operate
> > > similar to bswap when trying to find a "common" source for a combined
> > > store.  That is, it should appropriately follow defs.  For foo2 I expect
> > > it to be not too difficult, for foo1 I'd say we miss a FRE opportunity
> > > here (but Jakub is working on that one IIRC).
> >
> > Thanks Richard, not sure about my understanding and please correct if any.
> >
> > I tried Jukub's latest patch of "sccvn: Handle bitfields in 
> > push_partial_def".
> > Got to know fre pass checks the load instruction's vuse chain and do the 
> > constant
> > bitfield combines in push_partial_def, then 
> > native_encode_expr/native_interpret_expr
> > can decode and encode the constant content and shift/combine the data.
> > This should be based on one change to my test case(by adding return 
> > page->counters;)
> > to trigger the fre pass push all SSA name's partial_defs.  Currently, for 
> > SSA variables,
> > this encode/interpret is not supported yet, I suppose this is the 
> > opportunity you mean.
> > As this case's input is not constant, so Jukub's patch doesn't fix it.
> >
> > struct page
> > {
> >   union
> >   {
> > unsigned counters;
> > struct
> > {
> >   union
> >   {
> > struct
> > {
> >   unsigned inuse : 16;
> >   unsigned inuse2 : 8;
> >   unsigned objects : 5;
> >   unsigned frozen : 3;
> > };
> >   };
> > };
> >   };
> > };
> >
> > unsigned
> > foo1 (struct page *page, unsigned long counters_new)
> > {
> > struct page tmp;
> > tmp.counters = counters_new;
> > page->inuse   = tmp.inuse;
> > page->inuse2  = tmp.inuse2;
> > page->objects = tmp.objects;
> > page->frozen  = tmp.frozen;
> > return page->counters;
> > }
> >
> > If "return page->counters;" is removed, this case won't match the fre's 
> > current code path
> > in vn_reference_lookup_3 without a load(_14 = page_9(D)->D.2912.counters) 
> > to walk all the vuses.
> > So it seems not a fre opportunity but exactly a store merging issue(Also 
> > checked llvm that it
> > doesn't generate byte store and short store so it only produces 1 stw for 
> > all patterns).
>
> Huh, but the code has the return page->counters and you expec it to
> return counters_new unchanged.  Yes, you also expect the code to be
> optimized to
>
>page->counters = counters_new;
>return counters_new;
>
> and that part would indeed be a store-merging opportunity.  Currently
> FRE does
>
>   _1 = (unsigned int) counters_new_6(D);
>   tmp.D.1943.counters = _1;
>   _18 = (short unsigned int) counters_new_6(D);
>   page_9(D)->D.1943.D.1942.D.1941.D.1940.inuse = _18;
>   _19 = BIT_FIELD_REF <_1, 8, 16>;
>   page_9(D)->D.1943.D.1942.D.1941.D.1940.inuse2 = _19;
>   _4 = tmp.D.1943.D.1942.D.1941.D.1940.objects;
>   page_9(D)->D.1943.D.1942.D.1941.D.1940.objects = _4;
>   _5 = tmp.D.1943.D.1942.D.1941.D.1940.frozen;
>   page_9(D)->D.1943.D.1942.D.1941.D.1940.frozen = _5;
>
> so it fails to optimize the other loads to BIT_FIELD_REFs of _1.  If it
> did that then store-merging woudl already handle the code I think.
>
> > Optimize this case in pass store-merging is reasonable, just as you said,
> > "trying to find a "common" source for a combined store

[committed] testsuite: Add a -O2 -fgimple testcase next to the -O2 -fno-tree-dse one [PR93820]

2020-02-26 Thread Jakub Jelinek

Hi!

Based on IRC discussions, I've added another test with __GIMPLE next to the
one with -fno-tree-dse.

2020-02-26  Jakub Jelinek  

PR tree-optimization/93820
* gcc.dg/pr93820-2.c: New test.

--- gcc/testsuite/gcc.dg/pr93820-2.c.jj 2020-02-26 10:50:18.796137652 +0100
+++ gcc/testsuite/gcc.dg/pr93820-2.c2020-02-26 10:51:14.833310929 +0100
@@ -0,0 +1,30 @@
+/* PR tree-optimization/93820 */
+/* { dg-do run } */
+/* { dg-options "-O2 -fgimple" } */
+
+typedef int v4si __attribute__((vector_size(4 * sizeof (int;
+int a[10];
+
+void __GIMPLE (ssa,startwith("store-merging"))
+foo (int *p)
+{
+  int _2;
+  __BB(2):
+  _2 = *p_1(D);
+  a[6] = _2;
+  a[4] = 1;
+  __MEM  ((int *)&a + _Literal (int *) 16) = _Literal (v4si) { 0, 0, 0, 
0 };
+  a[3] = 0;
+  return;
+}
+
+int
+main ()
+{
+  int i = 0;
+  foo (&i);
+  for (i = 0; i < 10; i++)
+if (a[i])
+  __builtin_abort ();
+  return 0;
+}

Jakub

[PATCH] gimple-fold: Verify builtin prototype before folding [PR93927]

2020-02-26 Thread Jakub Jelinek

Hi!

The following patch does the same check the tree-ssa-strlen.c pass does
to punt on broken builtin redeclarations.
I agree the ultimate fix is change the C FE, but don't feel I know that part
of the FE good enough to know all the consequences and so it might be too
risky for stage4.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-02-26  Jakub Jelinek  

PR tree-optimization/93927
* gimple-fold.c (gimple_fold_builtin): Punt if callee has incompatible
prototype.

* gcc.c-torture/compile/pr93927-1.c: New test.
* gcc.c-torture/compile/pr93927-2.c: New test.

--- gcc/gimple-fold.c.jj2020-02-06 09:13:21.732085293 +0100
+++ gcc/gimple-fold.c   2020-02-25 11:46:14.863796864 +0100
@@ -3906,6 +3906,12 @@ gimple_fold_builtin (gimple_stmt_iterato
   if (avoid_folding_inline_builtin (callee))
 return false;
 
+  tree decl = builtin_decl_explicit (DECL_FUNCTION_CODE (callee));
+  if (decl
+  && decl != callee
+  && !gimple_builtin_call_types_compatible_p (stmt, decl))
+return false;
+
   unsigned n = gimple_call_num_args (stmt);
   enum built_in_function fcode = DECL_FUNCTION_CODE (callee);
   switch (fcode)
--- gcc/testsuite/gcc.c-torture/compile/pr93927-1.c.jj  2020-02-25 
11:47:10.983971425 +0100
+++ gcc/testsuite/gcc.c-torture/compile/pr93927-1.c 2020-02-25 
11:46:43.004382966 +0100
@@ -0,0 +1,9 @@
+/* PR tree-optimization/93927 */
+
+__SIZE_TYPE__ strstr (const char *, const char *);
+
+char *
+foo (char *x)
+{
+  return !!strstr (x, "0") ? "0" : "1";
+}
--- gcc/testsuite/gcc.c-torture/compile/pr93927-2.c.jj  2020-02-25 
11:47:13.930928081 +0100
+++ gcc/testsuite/gcc.c-torture/compile/pr93927-2.c 2020-02-25 
11:46:57.304172632 +0100
@@ -0,0 +1,9 @@
+/* PR tree-optimization/93927 */
+
+__SIZE_TYPE__ strchr (const char *, int);
+
+char *
+foo (char *x)
+{
+  return !!strchr (x, 0) ? "0" : "1";
+}

Jakub

Re: [PATCH] gimple-fold: Verify builtin prototype before folding [PR93927]

2020-02-26 Thread Richard Biener

On Wed, 26 Feb 2020, Jakub Jelinek wrote:

> Hi!
> 
> The following patch does the same check the tree-ssa-strlen.c pass does
> to punt on broken builtin redeclarations.
> I agree the ultimate fix is change the C FE, but don't feel I know that part
> of the FE good enough to know all the consequences and so it might be too
> risky for stage4.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Iff then we should make this change in 
gimple_builtin_call_types_compatible_p, not in gimple_fold_builtin.
I pasted such change in another recent PR but then backed off since
this is really ugly and instead the FEs should stop claiming sth is
BUILT_IN_FOO when it is not.  I even considered doing that during
cgraph build or so if the FE side is too difficult.  We could
even use exactly the same check there.

Richard.

> 2020-02-26  Jakub Jelinek  
> 
>   PR tree-optimization/93927
>   * gimple-fold.c (gimple_fold_builtin): Punt if callee has incompatible
>   prototype.
> 
>   * gcc.c-torture/compile/pr93927-1.c: New test.
>   * gcc.c-torture/compile/pr93927-2.c: New test.
> 
> --- gcc/gimple-fold.c.jj  2020-02-06 09:13:21.732085293 +0100
> +++ gcc/gimple-fold.c 2020-02-25 11:46:14.863796864 +0100
> @@ -3906,6 +3906,12 @@ gimple_fold_builtin (gimple_stmt_iterato
>if (avoid_folding_inline_builtin (callee))
>  return false;
>  
> +  tree decl = builtin_decl_explicit (DECL_FUNCTION_CODE (callee));
> +  if (decl
> +  && decl != callee
> +  && !gimple_builtin_call_types_compatible_p (stmt, decl))
> +return false;
> +
>unsigned n = gimple_call_num_args (stmt);
>enum built_in_function fcode = DECL_FUNCTION_CODE (callee);
>switch (fcode)
> --- gcc/testsuite/gcc.c-torture/compile/pr93927-1.c.jj2020-02-25 
> 11:47:10.983971425 +0100
> +++ gcc/testsuite/gcc.c-torture/compile/pr93927-1.c   2020-02-25 
> 11:46:43.004382966 +0100
> @@ -0,0 +1,9 @@
> +/* PR tree-optimization/93927 */
> +
> +__SIZE_TYPE__ strstr (const char *, const char *);
> +
> +char *
> +foo (char *x)
> +{
> +  return !!strstr (x, "0") ? "0" : "1";
> +}
> --- gcc/testsuite/gcc.c-torture/compile/pr93927-2.c.jj2020-02-25 
> 11:47:13.930928081 +0100
> +++ gcc/testsuite/gcc.c-torture/compile/pr93927-2.c   2020-02-25 
> 11:46:57.304172632 +0100
> @@ -0,0 +1,9 @@
> +/* PR tree-optimization/93927 */
> +
> +__SIZE_TYPE__ strchr (const char *, int);
> +
> +char *
> +foo (char *x)
> +{
> +  return !!strchr (x, 0) ? "0" : "1";
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

middle-end: Fix wrong code caused by disagreemed between FRE and access path oracle [PR 92152]

2020-02-26 Thread Jan Hubicka

Hi,
This patch solves problem caused by the disagreement between FRE and access
path orracle.

FRE is checking stores for equivalence based on their address, value and
base+ref alias sets.  Because ref alias set is not always the alias set of
innermost type, but it may be one of refs in the access path (as decided by
component_uses_parent_alias_set_from) it means that we can not really rely on
the remaining part of access path to be meaningful in any way except for
offset+size computation.

The patch makes alias (which is used by FRE to validate transform) and
tree-ssa-alias to share same logic for ending the access path relevant for
TBAA. tree-ssa-alias previously ended access paths on VIEW_CONVERT_EXPR and
BIT_FIELD_REF so it is not hard to wire in common predicate.  However it led to
additional issues (I tried to read the code quite carefully for possible extra
fun, so I hope I found it all):

  1) alias_component_refs_walk compares base and reference sizes to see
 if one access path may continue by another.  This check can be confused
 by an union containing structure with zero sized array.  In this case we
 no longer see the refernece to zero sized array and think that ref size
 is 0.

 In an access path there can be at most one (valid) trailing/zero sized
 array access, so the sizes in the access path are decreasing with the
 this exception. This is already handled by the logic, however the access
 is not expected to happen past the end of TBAA segment.  I suppose this
 was kind of latent problem before because one can think of access path
 doing traling array past VIEW_CONVERT_EXPR, but since in C code we don't
 VCE and in non-C we don't do trailing arrays, we did not hit the problem.

 I fixed this by tracking if the trailing array references appearing after
 the end of TBAA access path and mostly punt in the second case (because we
 need to support kind of all type puning here). I do not think we can assume
 much of sanity here, in particular, we no longer know there is only one
 because FRE may mix things up.

 An exception is the walk that looks for occurence of basetype of path1
 within TBAA relevant part of path2.  Here we realy care about TBAA
 relevant parts of paths and thus do not need to give up.

 I broke out the logic into ends_tbaa_access_path_p to avoid duplication and
 to let me stick some detailed comments. This became much more complex
 than I originally imagined (still it is useful to make oracle both faster
 and more precise).

 Note that logic in aliasing_component_refs_walk is safe since it works
 on TBAA relevant segments of paths only.
  2) nonoverlapping_refs_since_match_p is using TBAA only in the corner case
 that the paths got out of sync and re-synchronize of types of same size
 are found.  I thus extended it to whole paths (not only TBAA relevant
 parts) and track if the TBAA part can be used by counting of number of
 TBAA relevant res on the stack.

 I have noticed that in one case we call nonoverlapping_refs_since_match_p
 before checking for view converting MEM_REFs and in others we check
 after.  I think we want to just disable TBAA part if view convert
 is in there but still disambiguate.  I will do this incrementaly.
  3) nonoverlapping_component_refs_p uses TBAA so it needs to punt on
 end of TBAA path. It deals with no sizes and thus there is not the issue
 as in 1).

I am also attaching one (most probably) valid C++ testcase (by Mark Williams)
where we incorrectly disambiguated while the code is valid by the common
initial sequence rule.  This happens to be fixed by same patch. Here one access
goes through union and follows by access path trhough one filed, while other
access path start by different field of the union with common initial sequence.
This made aliasing_component_refs_p to not find the overlapping type (because
there is none) and disambiguate.  Now we cut the first access path by the union
reference and this makes us to find the path continuation in
alias_component_refs_walk.

If FRE is ever made more careful about access paths past the fist union
reference (I think that would be good idea since unions are quite common in C++
and we throw away quite useful info) then we will need to teach access path
oracle about the common initial sequence rule (which, as Mark pointed out, is
part of both C and C++ standards).

Only argument that can possibly invalidate this testcase is that I do not see
that stadnard is clear about the situation where one access path contains the
union but other starts after the union.

Clearly if both start after the union reference we are right to disambiguate
(since there is no union unvolved).  If both starts before union then there is
common initial sequence and by standard it is defined. This case works on 
current
trunk because aliasing_component_refs_p resorts to base+offset after fin

Re: middle-end: Fix wrong code caused by disagreemed between FRE and access path oracle [PR 92152]

2020-02-26 Thread Richard Biener

On Wed, 26 Feb 2020, Jan Hubicka wrote:

> Hi,
> This patch solves problem caused by the disagreement between FRE and access
> path orracle.
> 
> FRE is checking stores for equivalence based on their address, value and
> base+ref alias sets.  Because ref alias set is not always the alias set of
> innermost type, but it may be one of refs in the access path (as decided by
> component_uses_parent_alias_set_from) it means that we can not really rely on
> the remaining part of access path to be meaningful in any way except for
> offset+size computation.
> 
> The patch makes alias (which is used by FRE to validate transform) and
> tree-ssa-alias to share same logic for ending the access path relevant for
> TBAA. tree-ssa-alias previously ended access paths on VIEW_CONVERT_EXPR and
> BIT_FIELD_REF so it is not hard to wire in common predicate.  However it led 
> to
> additional issues (I tried to read the code quite carefully for possible extra
> fun, so I hope I found it all):
> 
>   1) alias_component_refs_walk compares base and reference sizes to see
>  if one access path may continue by another.  This check can be confused
>  by an union containing structure with zero sized array.  In this case we
>  no longer see the refernece to zero sized array and think that ref size
>  is 0.
> 
>  In an access path there can be at most one (valid) trailing/zero sized
>  array access, so the sizes in the access path are decreasing with the
>  this exception. This is already handled by the logic, however the access
>  is not expected to happen past the end of TBAA segment.  I suppose this
>  was kind of latent problem before because one can think of access path
>  doing traling array past VIEW_CONVERT_EXPR, but since in C code we don't
>  VCE and in non-C we don't do trailing arrays, we did not hit the problem.
> 
>  I fixed this by tracking if the trailing array references appearing after
>  the end of TBAA access path and mostly punt in the second case (because 
> we
>  need to support kind of all type puning here). I do not think we can 
> assume
>  much of sanity here, in particular, we no longer know there is only one
>  because FRE may mix things up.
> 
>  An exception is the walk that looks for occurence of basetype of path1
>  within TBAA relevant part of path2.  Here we realy care about TBAA
>  relevant parts of paths and thus do not need to give up.
> 
>  I broke out the logic into ends_tbaa_access_path_p to avoid duplication 
> and
>  to let me stick some detailed comments. This became much more complex
>  than I originally imagined (still it is useful to make oracle both faster
>  and more precise).
> 
>  Note that logic in aliasing_component_refs_walk is safe since it works
>  on TBAA relevant segments of paths only.
>   2) nonoverlapping_refs_since_match_p is using TBAA only in the corner case
>  that the paths got out of sync and re-synchronize of types of same size
>  are found.  I thus extended it to whole paths (not only TBAA relevant
>  parts) and track if the TBAA part can be used by counting of number of
>  TBAA relevant res on the stack.
> 
>  I have noticed that in one case we call nonoverlapping_refs_since_match_p
>  before checking for view converting MEM_REFs and in others we check
>  after.  I think we want to just disable TBAA part if view convert
>  is in there but still disambiguate.  I will do this incrementaly.
>   3) nonoverlapping_component_refs_p uses TBAA so it needs to punt on
>  end of TBAA path. It deals with no sizes and thus there is not the issue
>  as in 1).
> 
> I am also attaching one (most probably) valid C++ testcase (by Mark Williams)
> where we incorrectly disambiguated while the code is valid by the common
> initial sequence rule.  This happens to be fixed by same patch. Here one 
> access
> goes through union and follows by access path trhough one filed, while other
> access path start by different field of the union with common initial 
> sequence.
> This made aliasing_component_refs_p to not find the overlapping type (because
> there is none) and disambiguate.  Now we cut the first access path by the 
> union
> reference and this makes us to find the path continuation in
> alias_component_refs_walk.
> 
> If FRE is ever made more careful about access paths past the fist union
> reference (I think that would be good idea since unions are quite common in 
> C++
> and we throw away quite useful info) then we will need to teach access path
> oracle about the common initial sequence rule (which, as Mark pointed out, is
> part of both C and C++ standards).
> 
> Only argument that can possibly invalidate this testcase is that I do not see
> that stadnard is clear about the situation where one access path contains the
> union but other starts after the union.
> 
> Clearly if both start after the union reference we are right to disambiguate
>

[PATCH] optabs: Don't use scalar conversions for vectors [PR93843]

2020-02-26 Thread Richard Sandiford

In this PR we had a conversion between two integer vectors that
both had scalar integer modes.  We then tried to implement the
conversion using the scalar optab for those modes, instead of
doing the conversion elementwise.

I wondered about letting through scalar modes for single-element
vectors, but I don't have any evidence that that's useful/necessary,
so it seemed better to keep things simple.

Tested on aarch64-linux-gnu, armeb-eabi and x86_64-linux-gnu.
OK to install?

Richard


2020-02-26  Richard Sandiford  

gcc/
PR middle-end/93843
* optabs-tree.c (supportable_convert_operation): Reject types with
scalar modes.

gcc/testsuite/
PR middle-end/93843
* gcc.dg/vect/pr93843-1.c: New test.
* gcc.dg/vect/pr93843-2.c: Likewise.
---
 gcc/optabs-tree.c |  5 +
 gcc/testsuite/gcc.dg/vect/pr93843-1.c | 21 +
 gcc/testsuite/gcc.dg/vect/pr93843-2.c | 11 +++
 3 files changed, 37 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr93843-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr93843-2.c

diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index 3d829c27826..badd30bfda8 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -284,9 +284,14 @@ supportable_convert_operation (enum tree_code code,
   machine_mode m1,m2;
   bool truncp;
 
+  gcc_assert (VECTOR_TYPE_P (vectype_out) && VECTOR_TYPE_P (vectype_in));
+
   m1 = TYPE_MODE (vectype_out);
   m2 = TYPE_MODE (vectype_in);
 
+  if (!VECTOR_MODE_P (m1) || !VECTOR_MODE_P (m2))
+return false;
+
   /* First check if we can done conversion directly.  */
   if ((code == FIX_TRUNC_EXPR
&& can_fix_p (m1,m2,TYPE_UNSIGNED (vectype_out), &truncp)
diff --git a/gcc/testsuite/gcc.dg/vect/pr93843-1.c 
b/gcc/testsuite/gcc.dg/vect/pr93843-1.c
new file mode 100644
index 000..23a79ca4c96
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr93843-1.c
@@ -0,0 +1,21 @@
+char a;
+struct S { short b, c; } d;
+
+__attribute__((noipa)) void
+foo (int x)
+{
+  if (x != 4)
+__builtin_abort ();
+}
+
+int
+main ()
+{
+  short *g = &d.c, *h = &d.b;
+  char e = 4 - a;
+  int f;
+  *h = *g = e;
+  for (f = 0; f < 2; f++)
+foo (d.c);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/pr93843-2.c 
b/gcc/testsuite/gcc.dg/vect/pr93843-2.c
new file mode 100644
index 000..5fae3e5be17
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr93843-2.c
@@ -0,0 +1,11 @@
+char in[2] = {2, 2};
+short out[2] = {};
+
+int
+main()
+{
+  for (int i = 0; i < 2; ++i)
+out[i] = in[i];
+  asm("":::"memory");
+  if (out[0] != 2) __builtin_abort();
+}

[committed] analyzer: improvements to logging/dumping

2020-02-26 Thread David Malcolm

This patch adds various information to -fdump-analyzer and
-fdump-analyzer-stderr to make it easier to track down
problems with state explosions in the exploded_graph.

It logs the number of unprocessed nodes in the worklist, for
the case where the upper limit on exploded nodes is reached.

It prints:
[a] a bar chart showing the number of exploded nodes by function, and

[b] bar charts for each function showing the number of exploded nodes
per supernode/BB, and

[c] bar charts for each function showing the number of excess exploded
nodes per supernode/BB beyond the limit
(--param=analyzer-max-enodes-per-program-point), where that limit
was reached

I've found these helpful in finding exactly where we fail to consolidate
state, leading to state explosions and false negatives due to the
thresholds being reached.

The patch also adds a "superedge::dump" member function I found myself
needing.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as 67fa274cd635ec3c8af635294b67f09e45e3c56a.

gcc/ChangeLog:
* Makefile.in (ANALYZER_OBJS): Add analyzer/bar-chart.o.

gcc/analyzer/ChangeLog:
* bar-chart.cc: New file.
* bar-chart.h: New file.
* engine.cc: Include "analyzer/bar-chart.h".
(stats::log): Only log the m_num_nodes kinds that are non-zero.
(stats::dump): Likewise when dumping.
(stats::get_total_enodes): New.
(exploded_graph::get_or_create_node): Increment the per-point-data
m_excess_enodes when hitting the per-program-point limit on
enodes.
(exploded_graph::print_bar_charts): New.
(exploded_graph::log_stats): Log the number of unprocessed enodes
in the worklist.  Call print_bar_charts.
(exploded_graph::dump_stats): Print the number of unprocessed
enodes in the worklist.
* exploded-graph.h (stats::get_total_enodes): New decl.
(struct per_program_point_data): Add field m_excess_enodes.
(exploded_graph::print_bar_charts): New decl.
* supergraph.cc (superedge::dump): New.
(superedge::dump): New.
* supergraph.h (supernode::get_function): New.
(superedge::dump): New decl.
(superedge::dump): New decl.
---
 gcc/Makefile.in   |   1 +
 gcc/analyzer/bar-chart.cc | 102 +++
 gcc/analyzer/bar-chart.h  |  60 
 gcc/analyzer/engine.cc| 125 --
 gcc/analyzer/exploded-graph.h |   9 ++-
 gcc/analyzer/supergraph.cc|  23 +++
 gcc/analyzer/supergraph.h |   4 ++
 7 files changed, 317 insertions(+), 7 deletions(-)
 create mode 100644 gcc/analyzer/bar-chart.cc
 create mode 100644 gcc/analyzer/bar-chart.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index b1423d1dbfd..fa9923bb270 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1221,6 +1221,7 @@ ANALYZER_OBJS = \
analyzer/analyzer-logging.o \
analyzer/analyzer-pass.o \
analyzer/analyzer-selftests.o \
+   analyzer/bar-chart.o \
analyzer/call-string.o \
analyzer/checker-path.o \
analyzer/constraint-manager.o \
diff --git a/gcc/analyzer/bar-chart.cc b/gcc/analyzer/bar-chart.cc
new file mode 100644
index 000..d0e30b68d0f
--- /dev/null
+++ b/gcc/analyzer/bar-chart.cc
@@ -0,0 +1,102 @@
+/* Support for plotting bar charts in dumps.
+   Copyright (C) 2020 Free Software Foundation, Inc.
+   Contributed by David Malcolm .
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "pretty-print.h"
+#include "analyzer/bar-chart.h"
+
+#if ENABLE_ANALYZER
+
+namespace ana {
+
+/* class bar_chart.  */
+
+/* Add an item, taking a copy of NAME.  */
+
+void
+bar_chart::add_item (const char *name, value_t value)
+{
+  m_items.safe_push (new item (name, value));
+}
+
+/* Print the data to PP.  */
+
+void
+bar_chart::print (pretty_printer *pp) const
+{
+  /* Get maximum printing widths and maximum value.  */
+  size_t max_width_name = 0;
+  size_t max_width_value = 0;
+  value_t max_value = 0;
+  unsigned i;
+  item *item;
+  char digit_buffer[128];
+  FOR_EACH_VEC_ELT (m_items, i, item)
+{
+  max_width_name = MAX (max_width_name, item->m_strlen);
+  sprintf (digit_buffer, "%li", item->m_value);
+  max_width_value = MAX (max_width_value, strl

Re: [PATCH] optabs: Don't use scalar conversions for vectors [PR93843]

2020-02-26 Thread Richard Biener

On Wed, Feb 26, 2020 at 12:43 PM Richard Sandiford
 wrote:
>
> In this PR we had a conversion between two integer vectors that
> both had scalar integer modes.  We then tried to implement the
> conversion using the scalar optab for those modes, instead of
> doing the conversion elementwise.
>
> I wondered about letting through scalar modes for single-element
> vectors, but I don't have any evidence that that's useful/necessary,
> so it seemed better to keep things simple.
>
> Tested on aarch64-linux-gnu, armeb-eabi and x86_64-linux-gnu.
> OK to install?

OK.

Thanks,
Richard.

> Richard
>
>
> 2020-02-26  Richard Sandiford  
>
> gcc/
> PR middle-end/93843
> * optabs-tree.c (supportable_convert_operation): Reject types with
> scalar modes.
>
> gcc/testsuite/
> PR middle-end/93843
> * gcc.dg/vect/pr93843-1.c: New test.
> * gcc.dg/vect/pr93843-2.c: Likewise.
> ---
>  gcc/optabs-tree.c |  5 +
>  gcc/testsuite/gcc.dg/vect/pr93843-1.c | 21 +
>  gcc/testsuite/gcc.dg/vect/pr93843-2.c | 11 +++
>  3 files changed, 37 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr93843-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr93843-2.c
>
> diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
> index 3d829c27826..badd30bfda8 100644
> --- a/gcc/optabs-tree.c
> +++ b/gcc/optabs-tree.c
> @@ -284,9 +284,14 @@ supportable_convert_operation (enum tree_code code,
>machine_mode m1,m2;
>bool truncp;
>
> +  gcc_assert (VECTOR_TYPE_P (vectype_out) && VECTOR_TYPE_P (vectype_in));
> +
>m1 = TYPE_MODE (vectype_out);
>m2 = TYPE_MODE (vectype_in);
>
> +  if (!VECTOR_MODE_P (m1) || !VECTOR_MODE_P (m2))
> +return false;
> +
>/* First check if we can done conversion directly.  */
>if ((code == FIX_TRUNC_EXPR
> && can_fix_p (m1,m2,TYPE_UNSIGNED (vectype_out), &truncp)
> diff --git a/gcc/testsuite/gcc.dg/vect/pr93843-1.c 
> b/gcc/testsuite/gcc.dg/vect/pr93843-1.c
> new file mode 100644
> index 000..23a79ca4c96
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr93843-1.c
> @@ -0,0 +1,21 @@
> +char a;
> +struct S { short b, c; } d;
> +
> +__attribute__((noipa)) void
> +foo (int x)
> +{
> +  if (x != 4)
> +__builtin_abort ();
> +}
> +
> +int
> +main ()
> +{
> +  short *g = &d.c, *h = &d.b;
> +  char e = 4 - a;
> +  int f;
> +  *h = *g = e;
> +  for (f = 0; f < 2; f++)
> +foo (d.c);
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/pr93843-2.c 
> b/gcc/testsuite/gcc.dg/vect/pr93843-2.c
> new file mode 100644
> index 000..5fae3e5be17
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr93843-2.c
> @@ -0,0 +1,11 @@
> +char in[2] = {2, 2};
> +short out[2] = {};
> +
> +int
> +main()
> +{
> +  for (int i = 0; i < 2; ++i)
> +out[i] = in[i];
> +  asm("":::"memory");
> +  if (out[0] != 2) __builtin_abort();
> +}

Re: middle-end: Fix wrong code caused by disagreemed between FRE and access path oracle [PR 92152]

2020-02-26 Thread Jan Hubicka

> > Bootstrapped/regtested x86_64-linux, OK?
> 
> OK and thanks for the elaborate write-up and comments in the code ;)

Unforutnately our discussion on IRC let me construct another wrong code
testcase based on mixing up base alias sets

union U { long long i; long f; };
struct a {union U u;};
struct aa {struct a a;};
struct b {union U u;};
struct bb {struct b b;};
 
long
foo (struct bb *bv, void *ptr)
{
  struct aa *a = ptr;
  struct bb *b = ptr;
  bv->b.u.f = 1;
  a->a.u.i = 0;
  b->b.u.f = 0;
  return bv->b.u.f;
}
 
int
main ()
{
  union C {struct aa aa; struct bb bb;} v;
  if (foo (&v.bb, &v) != 0)
__builtin_abort ();
  return 0;
}

I think there is no undefined behaviour here (at least if we assume that
first store to v.aa changes the type inmplied by earlier store to v.bb.

Honza

Re: [PATCH] optabs: Don't use scalar conversions for vectors [PR93843]

2020-02-26 Thread Jakub Jelinek

On Wed, Feb 26, 2020 at 11:43:10AM +, Richard Sandiford wrote:
> In this PR we had a conversion between two integer vectors that
> both had scalar integer modes.  We then tried to implement the
> conversion using the scalar optab for those modes, instead of
> doing the conversion elementwise.
> 
> I wondered about letting through scalar modes for single-element
> vectors, but I don't have any evidence that that's useful/necessary,
> so it seemed better to keep things simple.
> 
> Tested on aarch64-linux-gnu, armeb-eabi and x86_64-linux-gnu.

Won't this prevent even say __v4qi to __v4uqi and similar conversions
with scalar modes for those where we don't need any kind of extensions,
just reinterpret the bits?

Jakub

Re: GLIBC libmvec status

2020-02-26 Thread Tulio Magno Quites Machado Filho

Jakub Jelinek  writes:

> Can you please explain how do you want to pass the
> void sincos (double, double *, double *);
> arguments?  I must say it isn't entirely clear from the document.
> You talk there about double[2], but sincos certainly doesn't have such an
> argument.

The plan [1] is to return a struct instead, i.e.:

struct sincosret _ZGVbN2v_sincos (vector double);
struct sincosretf _ZGVbN4v_sincosf (vector float);

Notice however, that change is still missing [2] from the libmvec patch
series [3].

[1] https://sourceware.org/ml/libc-alpha/2019-09/msg00334.html
[2] 
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_sincosf4_vsx.c;hb=refs/heads/tuliom/libmvec
[3] https://sourceware.org/git/?p=glibc.git;a=log;h=refs/heads/tuliom/libmvec

-- 
Tulio Magno

[PATCH] libstdc++: Fix use of inaccessible private member in split_view (PR93936)

2020-02-26 Thread Patrick Palka

We are calling _OuterIter::__current from _InnerIter::operator==, but the former
is private within this non-member friend.  Fix this by calling
_OuterIter::operator== instead, which does the right thing here.

libstdc++-v3/ChangeLog:

PR libstdc++/93936
* include/std/ranges (split_view::_InnerIter::operator==): Compare
the operands' _M_i rather than their _M_i.current().
* testsuite/std/ranges/adaptors/split.cc: Augment test.
---
 libstdc++-v3/include/std/ranges|  2 +-
 .../testsuite/std/ranges/adaptors/split.cc | 18 ++
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 1bc95039cf2..62c637f35e0 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -2837,7 +2837,7 @@ namespace views
  friend constexpr bool
  operator==(const _InnerIter& __x, const _InnerIter& __y)
requires forward_range<_Base>
- { return __x._M_i.__current() == __y._M_i.__current(); }
+ { return __x._M_i == __y._M_i; }
 
  friend constexpr bool
  operator==(const _InnerIter& __x, default_sentinel_t)
diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/split.cc 
b/libstdc++-v3/testsuite/std/ranges/adaptors/split.cc
index e25031c0aea..52b015cf0c6 100644
--- a/libstdc++-v3/testsuite/std/ranges/adaptors/split.cc
+++ b/libstdc++-v3/testsuite/std/ranges/adaptors/split.cc
@@ -91,6 +91,23 @@ test04()
   VERIFY( i == v.end() );
 }
 
+void
+test05()
+{
+  auto as_string = [](ranges::view auto rng) {
+auto in = rng | views::common;
+return std::string(in.begin(), in.end());
+  };
+  std::string str
+= "Now is the time for all good men to come to the aid of their county.";
+  auto rng
+= str | views::split(' ') | views::transform(as_string) | views::common;
+  std::vector words(rng.begin(), rng.end());
+  auto not_space_p = [](char c) { return c != ' '; };
+  VERIFY( ranges::equal(words | views::join,
+   str | views::filter(not_space_p)) );
+}
+
 int
 main()
 {
@@ -98,4 +115,5 @@ main()
   test02();
   test03();
   test04();
+  test05();
 }
-- 
2.25.1.377.g2d2118b814

[PATCH] coroutines: Amend parameter handling to match n4849.

2020-02-26 Thread Iain Sandoe

This is the second in the series to bring the GCC implementation into line
with the current standard.

@JunMa 
  I believe that this should solve the problems you were looking at in 
 “[PATCH Coroutines] Fix issue with unused corutine function parameters”
 and supercedes that patch (since we needed to alter the ordering as well).

  If any parameter issues remain, please file a PR (or a patch, of course).

OK for trunk?
thanks
Iain


=

In n4849 and preceding versions, [class.copy.elision] (1.3)
appears to confer additional permissions on coroutines to elide
parameter copies.

After considerable discussion on this topic by email and during
the February 2020 WG21 meeting, it has been determined that there
are no additional permissions applicable to coroutine parameter
copy elision.

The content of that clause in the standard is expected to be amended
eventually to clarify this.  Other than this, the handling of
parameter lifetimes is expected to be as per n4849:

 * A copy is made before the promise is constructed
 * If the promise CTOR uses the parms, then it should use the copy
   where appropriate.
 * The param copy lifetimes end after the promise is destroyed
   (during the coroutine frame destruction).
 * Otherwise, C++20 copy elision rules apply.

(as an aside) In practice, we expect that copy elision can only occur
when the coroutine body is fully inlined, possibly in conjunction with
heap allocation elision.

The patch:
 * Reorders the copying process to precede the promise CTOR and
ensures the correct use.
 * Copies all params into the frame regardless of whether the coro
   body uses them (this is a bit unfortunate, and we should figure
   out an amendment for C++23).

gcc/cp/ChangeLog:

2020-02-26  Iain Sandoe  

* coroutines.cc (struct param_info): Keep track of params
that are references, and cache the original type and whether
the DTOR is trivial.
(build_actor_fn): Handle param copies always, and adjust the
handling for references.
(register_param_uses): Only handle uses here.
(classtype_has_non_deleted_copy_ctor): New.
(morph_fn_to_coro): Adjust param copy handling to match n4849
by reordering ahead of the promise CTOR and always making a
frame copy, even if the param is unused in the coroutine body.

gcc/testsuite/ChangeLog:

2020-02-26  Iain Sandoe  

* g++.dg/coroutines/coro1-refs-and-ctors.h: New.
* g++.dg/coroutines/torture/func-params-07.C: New test.
* g++.dg/coroutines/torture/func-params-08.C: New test.
---
 gcc/cp/coroutines.cc  | 313 +++---
 .../g++.dg/coroutines/coro1-refs-and-ctors.h  | 144 
 .../coroutines/torture/func-params-07.C   |  81 +
 .../coroutines/torture/func-params-08.C   | 112 +++
 4 files changed, 524 insertions(+), 126 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/coroutines/coro1-refs-and-ctors.h
 create mode 100644 gcc/testsuite/g++.dg/coroutines/torture/func-params-07.C
 create mode 100644 gcc/testsuite/g++.dg/coroutines/torture/func-params-08.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 524d4872804..e0e7e66fe5e 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -1742,6 +1742,11 @@ struct param_info
   tree field_id;
   vec *body_uses;
   tree frame_type;
+  tree orig_type;
+  bool by_ref;
+  bool rv_ref;
+  bool pt_ref;
+  bool trivial_dtor;
 };
 
 struct local_var_info
@@ -1941,26 +1946,37 @@ build_actor_fn (location_t loc, tree coro_frame_type, 
tree actor, tree fnbody,
 
   /* Re-write param references in the body, no code should be generated
  here.  */
-  if (DECL_ARGUMENTS (orig) && param_uses != NULL)
+  if (DECL_ARGUMENTS (orig))
 {
   tree arg;
   for (arg = DECL_ARGUMENTS (orig); arg != NULL; arg = DECL_CHAIN (arg))
{
  bool existed;
  param_info &parm = param_uses->get_or_insert (arg, &existed);
- if (parm.field_id == NULL_TREE)
-   continue; /* Wasn't used.  */
+ if (!parm.body_uses)
+   continue; /* Wasn't used in the orignal function body.  */
+
  tree fld_ref = lookup_member (coro_frame_type, parm.field_id,
/*protect=*/1, /*want_type=*/0,
tf_warning_or_error);
- tree fld_idx = build3_loc (loc, COMPONENT_REF, TREE_TYPE (arg),
+ tree fld_idx = build3_loc (loc, COMPONENT_REF, parm.frame_type,
 actor_frame, fld_ref, NULL_TREE);
+
+ /* We keep these in the frame as a regular pointer, so convert that
+  back to the type expected.  */
+ if (parm.pt_ref)
+   fld_idx = build1_loc (loc, CONVERT_EXPR, TREE_TYPE (arg), fld_idx);
+
+ /* We expect an rvalue ref. here.  */
+ if (parm.rv_ref)
+   fld_idx = convert_to_reference (DECL_ARG_TYPE (arg), fld_idx,
+

Re: GLIBC libmvec status

2020-02-26 Thread Jakub Jelinek

On Wed, Feb 26, 2020 at 10:32:17AM -0300, Tulio Magno Quites Machado Filho 
wrote:
> Jakub Jelinek  writes:
> 
> > Can you please explain how do you want to pass the
> > void sincos (double, double *, double *);
> > arguments?  I must say it isn't entirely clear from the document.
> > You talk there about double[2], but sincos certainly doesn't have such an
> > argument.
> 
> The plan [1] is to return a struct instead, i.e.:
> 
> struct sincosret _ZGVbN2v_sincos (vector double);
> struct sincosretf _ZGVbN4v_sincosf (vector float);

Ugh, then certainly it shouldn't be mangled as simd variant of sincos, it
needs to be called something else.
The ABI can't be written for a single, even if commonly used, function, but
needs to be generic.
And if I have a
#pragma omp declare simd
void foo (double x, double *y, double *z);
then from the prototype there is no way to find out that it only uses the
second two arguments to store a single double through them.  It could very
well do
void foo (double x, double *y, double *z) {
  y[0] = y[1] + x;
  z[0] = z[1] + x;
}
or anything else and then you can't transform it to something like that.

Jakub

Re: GLIBC libmvec status

2020-02-26 Thread Richard Biener

On Wed, Feb 26, 2020 at 2:46 PM Jakub Jelinek  wrote:
>
> On Wed, Feb 26, 2020 at 10:32:17AM -0300, Tulio Magno Quites Machado Filho 
> wrote:
> > Jakub Jelinek  writes:
> >
> > > Can you please explain how do you want to pass the
> > > void sincos (double, double *, double *);
> > > arguments?  I must say it isn't entirely clear from the document.
> > > You talk there about double[2], but sincos certainly doesn't have such an
> > > argument.
> >
> > The plan [1] is to return a struct instead, i.e.:
> >
> > struct sincosret _ZGVbN2v_sincos (vector double);
> > struct sincosretf _ZGVbN4v_sincosf (vector float);
>
> Ugh, then certainly it shouldn't be mangled as simd variant of sincos, it
> needs to be called something else.
> The ABI can't be written for a single, even if commonly used, function, but
> needs to be generic.
> And if I have a
> #pragma omp declare simd
> void foo (double x, double *y, double *z);
> then from the prototype there is no way to find out that it only uses the
> second two arguments to store a single double through them.  It could very
> well do
> void foo (double x, double *y, double *z) {
>   y[0] = y[1] + x;
>   z[0] = z[1] + x;
> }
> or anything else and then you can't transform it to something like that.

Yeah.  I think you have to approach the sincos case from our internal
__builtin_cexpi which means

_Complex double foo (double);

and how that represents itself with OpenMP SIMD.

Richard.


> Jakub
>

[PATCH] dump load permutations and refcount per SLP node

2020-02-26 Thread Richard Biener

This adjusts dumping as proved useful in debugging.

Bootstrapped / tested on x86_64-unknown-linux-gnu, applied.

Richard.

2020-02-26  Richard Biener  

* tree-vect-slp.c (vect_print_slp_tree): Also dump ref count
and load permutation.
---
 gcc/tree-vect-slp.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 077c7946fee..c7ddd94b39f 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -1641,7 +1641,7 @@ static void
 vect_print_slp_tree (dump_flags_t dump_kind, dump_location_t loc,
 slp_tree node, hash_set &visited)
 {
-  unsigned i;
+  unsigned i, j;
   stmt_vec_info stmt_info;
   slp_tree child;
   tree op;
@@ -1651,13 +1651,13 @@ vect_print_slp_tree (dump_flags_t dump_kind, 
dump_location_t loc,
 
   dump_metadata_t metadata (dump_kind, loc.get_impl_location ());
   dump_user_location_t user_loc = loc.get_user_location ();
-  dump_printf_loc (metadata, user_loc, "node%s %p (max_nunits=%u)\n",
+  dump_printf_loc (metadata, user_loc, "node%s %p (max_nunits=%u, 
refcnt=%u)\n",
   SLP_TREE_DEF_TYPE (node) == vect_external_def
   ? " (external)"
   : (SLP_TREE_DEF_TYPE (node) == vect_constant_def
  ? " (constant)"
  : ""), node,
-  estimated_poly_value (node->max_nunits));
+  estimated_poly_value (node->max_nunits), node->refcnt);
   if (SLP_TREE_SCALAR_STMTS (node).exists ())
 FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
   dump_printf_loc (metadata, user_loc, "\tstmt %u %G", i, stmt_info->stmt);
@@ -1669,6 +1669,13 @@ vect_print_slp_tree (dump_flags_t dump_kind, 
dump_location_t loc,
 i < SLP_TREE_SCALAR_OPS (node).length () - 1 ? "," : "");
   dump_printf (metadata, "}\n");
 }
+  if (SLP_TREE_LOAD_PERMUTATION (node).exists ())
+{
+  dump_printf_loc (metadata, user_loc, "\tload permutation {");
+  FOR_EACH_VEC_ELT (SLP_TREE_LOAD_PERMUTATION (node), i, j)
+   dump_printf (dump_kind, " %u", j);
+  dump_printf (dump_kind, " }\n");
+}
   if (SLP_TREE_CHILDREN (node).is_empty ())
 return;
   dump_printf_loc (metadata, user_loc, "\tchildren");
-- 
2.25.1

Re: GLIBC libmvec status

2020-02-26 Thread Bill Schmidt


On 2/26/20 2:18 AM, Jakub Jelinek wrote:

On Tue, Feb 25, 2020 at 07:43:09PM -0600, Bill Schmidt wrote:

The reason that homogeneous aggregates matter (at least somewhat) is that
the ABI ^H^H^H^HAPI requires establishing a calling convention and a name-
mangling formula that includes the length of parameters and return values.
Since ELFv2 and ELFv1 do not have the same calling convention, and ELFv2
has a superior one, we chose to use ELFv2's calling convention and make use
of homogeneous aggregates for return values in registers for the case of
vectorized sincos.

Can you please explain how do you want to pass the
void sincos (double, double *, double *);
arguments?  I must say it isn't entirely clear from the document.
You talk there about double[2], but sincos certainly doesn't have such an
argument.


The hope is that we can create a vectorized version that returns values
in registers rather than the by-ref parameters, and add code to GCC to
copy things around correctly following the call.  Ideally the signature of
the vectorized version would be sth like

  struct retval {vector double, vector double};
  retval vecsincos (vector double);

In the typical case where calls to sincos are of the form

  sincos (val[i], &sinval[i], &cosval[i]);

this would allow us to only store the values in the caller upon return,
rather than store them in the callee and potentially reload them
immediately in the caller.  On some Power CPUs, the latter behavior can
result in somewhat costly stalls if the consecutive accesses hit a timing
window.

If you feel it isn't possible to do this, then we can abandon it.  Right
now my understanding is that GCC doesn't vectorize calls to sincos yet
for any targets, so it would be moot except that we really should define
what happens for the future.

This calling convention would also be useful in the future for vectorizing
functions that return complex values either by value or by reference.



Also, I'd say ignoring the masked variants is a mistake, are you going to
warn any time the user uses inbranch or even doesn't specify notinbranch?
The masking can be implemented even without highly specialized instructions,
e.g. on x86 only AVX512F has full masking support, for older ISAs all that
is there is conditional store or e.g. for integral operations that can't
trap/raise exceptions just doing blend-like operations (or even and/or) is
all that is needed; just let the vectorizer do its job.


Well, as a matter of practicality, we don't have any of that implemented
in the rs6000 back end, and we don't have any free resources to do that
in GCC 11.  Is there any documentation about what needs to be done to
support this?  I've always been under the impression that vectorizing for
masking when there isn't any hardware support is a losing proposition, so
we've not investigated it.

Thanks,
Bill



Even if you don't want it for libmvec, just use
__attribute__((simd ("notinbranch")))
for those, but allow the user to use it where it makes sense.

Jakub

Re: [PATCH] coroutines: Amend parameter handling to match n4849.

2020-02-26 Thread Nathan Sidwell


On 2/26/20 8:43 AM, Iain Sandoe wrote:

This is the second in the series to bring the GCC implementation into line
with the current standard.




2020-02-26  Iain Sandoe  

* coroutines.cc (struct param_info): Keep track of params
that are references, and cache the original type and whether
the DTOR is trivial.
(build_actor_fn): Handle param copies always, and adjust the
handling for references.
(register_param_uses): Only handle uses here.
(classtype_has_non_deleted_copy_ctor): New.
(morph_fn_to_coro): Adjust param copy handling to match n4849
by reordering ahead of the promise CTOR and always making a
frame copy, even if the param is unused in the coroutine body.


Ok, modulo a couple of changes:


@@ -1742,6 +1742,11 @@ struct param_info
tree field_id;
vec *body_uses;
tree frame_type;
+  tree orig_type;
+  bool by_ref;
+  bool rv_ref;
+  bool pt_ref;
+  bool trivial_dtor;
  };


Can you comment the fields here?


+bool
+classtype_has_non_deleted_copy_ctor (tree t)
+{
+  if (CLASSTYPE_LAZY_COPY_CTOR (t))
+lazily_declare_fn (sfk_copy_constructor, t);
+  for (ovl_iterator iter (CLASSTYPE_CONSTRUCTORS (t)); iter; ++iter)
+if (copy_fn_p (*iter) && !DECL_DELETED_FN (*iter))
+  return true;
+  return false;
+}


Please put this in class.c next to the non deleted move ctor variant. 
Needs a comment.


Ok with those changes, no need to re-review.

nathan

--
Nathan Sidwell

Re: [PATCH] libstdc++: Fix use of inaccessible private member in split_view (PR93936)

2020-02-26 Thread Jonathan Wakely


On 26/02/20 08:37 -0500, Patrick Palka wrote:

We are calling _OuterIter::__current from _InnerIter::operator==, but the former
is private within this non-member friend.  Fix this by calling
_OuterIter::operator== instead, which does the right thing here.

libstdc++-v3/ChangeLog:

PR libstdc++/93936
* include/std/ranges (split_view::_InnerIter::operator==): Compare
the operands' _M_i rather than their _M_i.current().
* testsuite/std/ranges/adaptors/split.cc: Augment test.


OK thanks.

Re: [PATCH] libstdc++: P1645R1 constexpr for algorithms

2020-02-26 Thread Jonathan Wakely


On 25/02/20 15:36 -0500, Patrick Palka wrote:

This adds constexpr to 11 algorithms defined in  as per P1645R1.

Tested on x86_64-pc-linux-gnu, OK to commit?

libstdc++-v3/ChangeLog:

P1645R1 constexpr for  algorithms
* include/bits/stl_numeric.h (iota, accumulate, inner_product,
partial_sum, adjacent_difference): Make conditionally constexpr for
C++20.
* include/std/numeric (reduce, transform_reduce, exclusive_scan,
inclusive_scan, transform_exclusive_scan, transform_inclusive_scan):
Likewise.  Define the feature test macro __cpp_lib_constexpr_numeric.
* testsuite/26_numerics/accumulate/constexpr.cc: New test.
* testsuite/26_numerics/adjacent_difference/constexpr.cc: Likewise.
* testsuite/26_numerics/exclusive_scan/constexpr.cc: Likewise.
* testsuite/26_numerics/inclusive_scan/constexpr.cc: Likewise.
* testsuite/26_numerics/inner_product/constexpr.cc: Likewise.
* testsuite/26_numerics/iota/constexpr.cc: Likewise.
* testsuite/26_numerics/partial_sum/constexpr.cc: Likewise.
* testsuite/26_numerics/reduce/constexpr.cc: Likewise.
* testsuite/26_numerics/transform_exclusive_scan/constexpr.cc: Likewise.
* testsuite/26_numerics/transform_inclusive_scan/constexpr.cc: Likewise.
* testsuite/26_numerics/transform_reduce/constexpr.cc: Likewise.
---
libstdc++-v3/include/bits/stl_numeric.h   |  9 +++
libstdc++-v3/include/std/numeric  | 18 ++
.../26_numerics/accumulate/constexpr.cc   | 42 ++
.../adjacent_difference/constexpr.cc  | 44 +++
.../26_numerics/exclusive_scan/constexpr.cc   | 44 +++
.../26_numerics/inclusive_scan/constexpr.cc   | 55 +++
.../26_numerics/inner_product/constexpr.cc| 45 +++
.../testsuite/26_numerics/iota/constexpr.cc   | 31 +++
.../26_numerics/partial_sum/constexpr.cc  | 44 +++
.../testsuite/26_numerics/reduce/constexpr.cc | 52 ++
.../transform_exclusive_scan/constexpr.cc | 33 +++
.../transform_inclusive_scan/constexpr.cc | 44 +++
.../26_numerics/transform_reduce/constexpr.cc | 55 +++
13 files changed, 516 insertions(+)
create mode 100644 libstdc++-v3/testsuite/26_numerics/accumulate/constexpr.cc
create mode 100644 
libstdc++-v3/testsuite/26_numerics/adjacent_difference/constexpr.cc
create mode 100644 
libstdc++-v3/testsuite/26_numerics/exclusive_scan/constexpr.cc
create mode 100644 
libstdc++-v3/testsuite/26_numerics/inclusive_scan/constexpr.cc
create mode 100644 libstdc++-v3/testsuite/26_numerics/inner_product/constexpr.cc
create mode 100644 libstdc++-v3/testsuite/26_numerics/iota/constexpr.cc
create mode 100644 libstdc++-v3/testsuite/26_numerics/partial_sum/constexpr.cc
create mode 100644 libstdc++-v3/testsuite/26_numerics/reduce/constexpr.cc
create mode 100644 
libstdc++-v3/testsuite/26_numerics/transform_exclusive_scan/constexpr.cc
create mode 100644 
libstdc++-v3/testsuite/26_numerics/transform_inclusive_scan/constexpr.cc
create mode 100644 
libstdc++-v3/testsuite/26_numerics/transform_reduce/constexpr.cc

diff --git a/libstdc++-v3/include/bits/stl_numeric.h 
b/libstdc++-v3/include/bits/stl_numeric.h
index dd4683fe92e..f95c86a0d48 100644
--- a/libstdc++-v3/include/bits/stl_numeric.h
+++ b/libstdc++-v3/include/bits/stl_numeric.h
@@ -83,6 +83,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   *  @ingroup numeric_ops
   */
  template
+_GLIBCXX20_CONSTEXPR
void
iota(_ForwardIterator __first, _ForwardIterator __last, _Tp __value)
{
@@ -128,6 +129,7 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
   *  @return  The final sum.
   */
  template
+_GLIBCXX20_CONSTEXPR
inline _Tp
accumulate(_InputIterator __first, _InputIterator __last, _Tp __init)
{
@@ -154,6 +156,7 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
   *  @return  The final sum.
   */
  template
+_GLIBCXX20_CONSTEXPR
inline _Tp
accumulate(_InputIterator __first, _InputIterator __last, _Tp __init,
   _BinaryOperation __binary_op)
@@ -182,6 +185,7 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
   *  @return  The final inner product.
   */
  template
+_GLIBCXX20_CONSTEXPR
inline _Tp
inner_product(_InputIterator1 __first1, _InputIterator1 __last1,
  _InputIterator2 __first2, _Tp __init)
@@ -214,6 +218,7 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
   */
  template
+_GLIBCXX20_CONSTEXPR
inline _Tp
inner_product(_InputIterator1 __first1, _InputIterator1 __last1,
  _InputIterator2 __first2, _Tp __init,
@@ -246,6 +251,7 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
   *  @return  Iterator pointing just beyond the values written to __result.
   */
  template
+_GLIBCXX20_CONSTEXPR
_OutputIterator
partial_sum(_InputIterator __first, _InputIterator __last,
_OutputIterator __result)
@@ -287,6 +293,7 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
   */
  template
+_GLIBCXX20_CONS

Re: GLIBC libmvec status

2020-02-26 Thread Jakub Jelinek

On Wed, Feb 26, 2020 at 07:55:53AM -0600, Bill Schmidt wrote:
> The hope is that we can create a vectorized version that returns values
> in registers rather than the by-ref parameters, and add code to GCC to
> copy things around correctly following the call.  Ideally the signature of
> the vectorized version would be sth like
> 
>   struct retval {vector double, vector double};
>   retval vecsincos (vector double);
> 
> In the typical case where calls to sincos are of the form
> 
>   sincos (val[i], &sinval[i], &cosval[i]);
> 
> this would allow us to only store the values in the caller upon return,
> rather than store them in the callee and potentially reload them
> immediately in the caller.  On some Power CPUs, the latter behavior can
> result in somewhat costly stalls if the consecutive accesses hit a timing
> window.

But can't you do
#pragma omp declare simd linear(sinp, cosp)
void sincos (double x, double *sinp, double *cosp);
?
That is something the vectorizer code could handle and for
  for (int i = 0; i < 1024; i++)
sincos (val[i], &sinval[i], &cosval[i]);
just vectorize it as
  for (int i = 0; i < 1024; i += vf)
_ZGVbN8vl8l8_sincos (*(vector double *)&val[i], &sinval[i], &cosval[i]);
Anything else will need specialized code to handle sincos specially in the
vectorizer.

> If you feel it isn't possible to do this, then we can abandon it.  Right
> now my understanding is that GCC doesn't vectorize calls to sincos yet
> for any targets, so it would be moot except that we really should define
> what happens for the future.
> 
> This calling convention would also be useful in the future for vectorizing
> functions that return complex values either by value or by reference.

Only by value, you really don't know what the code does if something is
passed by reference, whether it is read, written into, or both etc.
And for _Complex {float,double}, e.g. the Intel ABI already specifies how to
pass them, just GCC isn't able to do that right now.

> Well, as a matter of practicality, we don't have any of that implemented
> in the rs6000 back end, and we don't have any free resources to do that
> in GCC 11.  Is there any documentation about what needs to be done to
> support this?  I've always been under the impression that vectorizing for
> masking when there isn't any hardware support is a losing proposition, so
> we've not investigated it.

You don't need to do pretty much anything, except set
clonei->mask_mode = VOIDmode, I think the generic code should handle that
everything beyond that, in particular add the mask argument and use it
both on the caller side and on the expansion of the to be vectorized clone.

Jakub

[PATCH 2/4] arc: Improve code gen for 64bit add/sub operations.

2020-02-26 Thread Claudiu Zissulescu

Early expand ADDDI3 and SUBDI3 for better code gen.

gcc/
-xx-xx  Claudiu Zissulescu  

* config/arc/arc.md (adddi3): Early expand the 64bit operation into
32bit ops.
(subdi3): Likewise.
(adddi3_i): Remove pattern.
(subdi3_i): Likewise.
---
 gcc/config/arc/arc.md | 116 +++---
 1 file changed, 41 insertions(+), 75 deletions(-)

diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index 0d92674441d..4c34861597b 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -2759,34 +2759,20 @@ archs4x, archs4xd"
   ")
 
 (define_expand "adddi3"
-  [(parallel [(set (match_operand:DI 0 "dest_reg_operand" "")
-  (plus:DI (match_operand:DI 1 "register_operand" "")
-   (match_operand:DI 2 "nonmemory_operand" "")))
- (clobber (reg:CC CC_REG))])]
-  ""
-{})
-
-; This assumes that there can be no strictly partial overlap between
-; operands[1] and operands[2].
-(define_insn_and_split "*adddi3_i"
-  [(set (match_operand:DI 0 "dest_reg_operand" "=&w,w,w")
-   (plus:DI (match_operand:DI 1 "register_operand" "%c,0,c")
-(match_operand:DI 2 "nonmemory_operand" "ci,ci,!i")))
+  [(set (match_operand:DI 0 "register_operand" "")
+   (plus:DI (match_operand:DI 1 "register_operand" "")
+(match_operand:DI 2 "nonmemory_operand" "")))
(clobber (reg:CC CC_REG))]
   ""
-  "#"
-  "reload_completed"
-  [(const_int 0)]
-{
-  int hi = !TARGET_BIG_ENDIAN;
-  int lo = !hi;
-  rtx l0 = operand_subword (operands[0], lo, 0, DImode);
-  rtx h0 = operand_subword (operands[0], hi, 0, DImode);
-  rtx l1 = operand_subword (operands[1], lo, 0, DImode);
-  rtx h1 = operand_subword (operands[1], hi, 0, DImode);
-  rtx l2 = operand_subword (operands[2], lo, 0, DImode);
-  rtx h2 = operand_subword (operands[2], hi, 0, DImode);
-
+  "
+  rtx l0 = gen_lowpart (SImode, operands[0]);
+  rtx h0 = gen_highpart (SImode, operands[0]);
+  rtx l1 = gen_lowpart (SImode, operands[1]);
+  rtx h1 = gen_highpart (SImode, operands[1]);
+  rtx l2 = simplify_gen_subreg (SImode, operands[2], DImode,
+subreg_lowpart_offset (SImode, DImode));
+  rtx h2 = simplify_gen_subreg (SImode, operands[2], DImode,
+subreg_highpart_offset (SImode, DImode));
 
   if (l2 == const0_rtx)
 {
@@ -2797,13 +2783,6 @@ archs4x, archs4xd"
emit_move_insn (l0, l1);
   DONE;
 }
-  if (CONST_INT_P (operands[2]) && INTVAL (operands[2]) < 0
-  && INTVAL (operands[2]) >= -0x7fff)
-{
-  emit_insn (gen_subdi3_i (operands[0], operands[1],
-GEN_INT (-INTVAL (operands[2];
-  DONE;
-}
   if (rtx_equal_p (l0, h1))
 {
   if (h2 != const0_rtx)
@@ -2817,28 +2796,32 @@ archs4x, archs4xd"
   gen_rtx_LTU (VOIDmode, gen_rtx_REG (CC_Cmode, CC_REG), GEN_INT (0)),
   gen_rtx_SET (h0, plus_constant (SImode, h0, 1;
   DONE;
-}
+  }
   emit_insn (gen_add_f (l0, l1, l2));
   emit_insn (gen_adc (h0, h1, h2));
   DONE;
-}
-  [(set_attr "cond" "clob")
-   (set_attr "type" "binary")
-   (set_attr "length" "16,16,20")])
+")
 
 (define_insn "add_f"
   [(set (reg:CC_C CC_REG)
(compare:CC_C
- (plus:SI (match_operand:SI 1 "register_operand" "c,0,c")
-  (match_operand:SI 2 "nonmemory_operand" "cL,I,cCal"))
+ (plus:SI (match_operand:SI 1 "nonmemory_operand" "%r,L,0,I,Cal,r")
+  (match_operand:SI 2 "nonmemory_operand" "rL,r,I,0,  r,rCal"))
  (match_dup 1)))
-   (set (match_operand:SI 0 "dest_reg_operand" "=w,Rcw,w")
+   (set (match_operand:SI 0 "dest_reg_operand" "=r,r,r,r,r,r")
(plus:SI (match_dup 1) (match_dup 2)))]
-  ""
-  "add.f %0,%1,%2"
+  "register_operand (operands[1], SImode)
+   || register_operand (operands[2], SImode)"
+  "@
+  add.f\\t%0,%1,%2
+  add.f\\t%0,%2,%1
+  add.f\\t%0,%1,%2
+  add.f\\t%0,%2,%1
+  add.f\\t%0,%2,%1
+  add.f\\t%0,%1,%2"
   [(set_attr "cond" "set")
(set_attr "type" "compare")
-   (set_attr "length" "4,4,8")])
+   (set_attr "length" "4,4,4,4,8,8")])
 
 (define_insn "*add_f_2"
   [(set (reg:CC_C CC_REG)
@@ -2993,35 +2976,20 @@ archs4x, archs4xd"
   ])
 
 (define_expand "subdi3"
-  [(parallel [(set (match_operand:DI 0 "dest_reg_operand" "")
-  (minus:DI (match_operand:DI 1 "nonmemory_operand" "")
-(match_operand:DI 2 "nonmemory_operand" "")))
- (clobber (reg:CC CC_REG))])]
-  ""
-{
-  if (!register_operand (operands[2], DImode))
-operands[1] = force_reg (DImode, operands[1]);
-})
-
-(define_insn_and_split "subdi3_i"
-  [(set (match_operand:DI 0 "dest_reg_operand" "=&w,w,w,w,w")
-   (minus:DI (match_operand:DI 1 "nonmemory_operand" "ci,0,ci,c,!i")
- (match_operand:DI 2 "nonmemory_operand" "ci,ci,0,!i,c")))
+  [(set (match_operand:DI 0 "register_operand" "")
+   (minus:DI (match_operand:DI 1 "registe

[PATCH 1/4] arc: Add length attribute to eh_return pattern.

2020-02-26 Thread Claudiu Zissulescu

Add length attribute to eh_return pattern.

gcc/
-xx-xx  Claudiu Zissulescu  

* config/arc/arc.md (eh_return): Add length info.
---
 gcc/config/arc/arc.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index fb25aafb024..0d92674441d 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -6765,7 +6765,9 @@ archs4x, archs4xd"
 arc_eh_return_address_location (operands[0]);
 DONE;
   }"
-)
+  [(set_attr "length" "8")]
+  )
+
 ;; include the arc-FPX instructions
 (include "fpx.md")
 
-- 
2.24.1

[PATCH 4/4] arc: Update legitimate small data address.

2020-02-26 Thread Claudiu Zissulescu

All ARC's small data adressing is using address scaling feature of the
load/store instructions (i.e., the address is made of a general
pointer plus a shifted offset. The shift amount depends on the
addressing mode).  This patch is checking the offset of an address if
it fits the scalled constraint.  If so, a small data access is
generated.  This patch fixes execute' pr93249 failure.

gcc/
-xx-xx  Claudiu Zissulescu  

* config/arc/arc.c (leigitimate_small_data_address_p): Check if an
address has an offset which fits the scalling constraint for a
load/store operation.
(legitimate_scaled_address_p): Update use
leigitimate_small_data_address_p.
(arc_print_operand): Likewise.
(arc_legitimate_address_p): Likewise.
(legitimate_small_data_address_p): Likewise.

Signed-off-by: Claudiu Zissulescu 
---
 gcc/config/arc/arc.c | 36 ++--
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index 960645fdfbf..af26e5b9711 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -463,21 +463,37 @@ get_symbol_alignment (rtx x)
 /* Return true if x is ok to be used as a small data address.  */
 
 static bool
-legitimate_small_data_address_p (rtx x)
+legitimate_small_data_address_p (rtx x, machine_mode mode)
 {
   switch (GET_CODE (x))
 {
 case CONST:
-  return legitimate_small_data_address_p (XEXP (x, 0));
+  return legitimate_small_data_address_p (XEXP (x, 0), mode);
 case SYMBOL_REF:
   return SYMBOL_REF_SMALL_P (x);
 case PLUS:
   {
bool p0 = (GET_CODE (XEXP (x, 0)) == SYMBOL_REF)
  && SYMBOL_REF_SMALL_P (XEXP (x, 0));
-   bool p1 = CONST_INT_P (XEXP (x, 1))
- && (INTVAL (XEXP (x, 1)) <= g_switch_value);
-   return p0 && p1;
+
+   /* If no constant then we cannot do small data.  */
+   if (!CONST_INT_P (XEXP (x, 1)))
+ return false;
+
+   /* Small data relocs works with scalled addresses, check if
+  the immediate fits the requirements.  */
+   switch (GET_MODE_SIZE (mode))
+ {
+ case 1:
+   return p0;
+ case 2:
+   return p0 && ((INTVAL (XEXP (x, 1)) & 0x1) == 0);
+ case 4:
+ case 8:
+   return p0 && ((INTVAL (XEXP (x, 1)) & 0x3) == 0);
+ default:
+   return false;
+ }
   }
 default:
   return false;
@@ -531,7 +547,7 @@ legitimate_scaled_address_p (machine_mode mode, rtx op, 
bool strict)
 }
 
   /* Scalled addresses for sdata is done other places.  */
-  if (legitimate_small_data_address_p (op))
+  if (legitimate_small_data_address_p (op, mode))
 return false;
 
   if (CONSTANT_P (XEXP (op, 1)))
@@ -4825,7 +4841,7 @@ arc_print_operand (FILE *file, rtx x, int code)
  break;
case SYMBOL_REF:
case CONST:
- if (legitimate_small_data_address_p (addr)
+ if (legitimate_small_data_address_p (addr, GET_MODE (x))
  && GET_MODE_SIZE (GET_MODE (x)) > 1)
{
  int align = get_symbol_alignment (addr);
@@ -4958,7 +4974,7 @@ arc_print_operand (FILE *file, rtx x, int code)
rtx addr = XEXP (x, 0);
int size = GET_MODE_SIZE (GET_MODE (x));
 
-   if (legitimate_small_data_address_p (addr))
+   if (legitimate_small_data_address_p (addr, GET_MODE (x)))
  output_sdata = 1;
 
fputc ('[', file);
@@ -6716,7 +6732,7 @@ arc_legitimate_address_p (machine_mode mode, rtx x, bool 
strict)
  return true;
   if (legitimate_scaled_address_p (mode, x, strict))
 return true;
-  if (legitimate_small_data_address_p (x))
+  if (legitimate_small_data_address_p (x, mode))
  return true;
   if (GET_CODE (x) == CONST_INT && LARGE_INT (INTVAL (x)))
  return true;
@@ -8870,7 +8886,7 @@ compact_sda_memory_operand (rtx op, machine_mode mode, 
bool short_p)
   /* Decode the address now.  */
   addr = XEXP (op, 0);
 
-  if (!legitimate_small_data_address_p (addr))
+  if (!legitimate_small_data_address_p (addr, mode))
 return false;
 
   if (!short_p || size == 1)
-- 
2.24.1

[PATCH 3/4] arc: Use accl_operand predicate for fma instructions.

2020-02-26 Thread Claudiu Zissulescu

With the refurbish of ARC600' accumulator support, the mlo_operand
doesn't reflect the proper low accumulator register for the newer
ARCv2 accumulator register used by the fma instructions, replace
it with accl_operand predicate.

gcc/
-xx-xx  Claudiu Zissulescu  

* config/arc/arc.md (fmasf4_fpu): Use accl_operand predicate.
(fnmasf4_fpu): Likewise.
---
 gcc/config/arc/fpu.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/arc/fpu.md b/gcc/config/arc/fpu.md
index 6729795de54..14ebd865206 100644
--- a/gcc/config/arc/fpu.md
+++ b/gcc/config/arc/fpu.md
@@ -89,7 +89,7 @@
   [(set (match_operand:SF 0 "register_operand"  "=r,r,r,r,r")
(fma:SF (match_operand:SF 1 "nonmemory_operand" "%0,r,0,r,F")
(match_operand:SF 2 "nonmemory_operand"  "r,r,F,F,r")
-   (match_operand:SF 3 "mlo_operand" "")))]
+   (match_operand:SF 3 "accl_operand" "")))]
   "TARGET_FP_SP_FUSED
&& (register_operand (operands[1], SFmode)
|| register_operand (operands[2], SFmode))"
@@ -104,7 +104,7 @@
   [(set (match_operand:SF 0 "register_operand"  "=r,r,r,r,r")
(fma:SF (neg:SF (match_operand:SF 1 "nonmemory_operand" "%0,r,0,r,F"))
(match_operand:SF 2 "nonmemory_operand"  "r,r,F,F,r")
-   (match_operand:SF 3 "mlo_operand" "")))]
+   (match_operand:SF 3 "accl_operand" "")))]
   "TARGET_FP_SP_FUSED
&& (register_operand (operands[1], SFmode)
|| register_operand (operands[2], SFmode))"
-- 
2.24.1

Re: patch to fix PR93564

2020-02-26 Thread Andrew Stubbs


On 23/02/2020 21:25, Vladimir Makarov wrote:

The following patch is for

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93564

The patch was successfully bootstrapped on x86-64 and benchmarked on 
SPEC2000.


Since this patch I get an ICE with checking enabled, for amdgcn-amdhsa:

during RTL pass: reload
dump file: /scratch/astubbs/amd/upstreamB/tmp/target.283r.reload
../gcc.c-torture/execute/ieee/compare-fp-1.c: In function 'ieq':
../gcc.c-torture/execute/ieee/compare-fp-1.c:33:1: internal 
compiler error: in lra_constraints, at lra-constraints.c:5051

   33 | }
  | ^
0x103d7cf lra_constraints(bool)
/gcc/lra-constraints.c:5051
0x102541a lra(_IO_FILE*)
/gcc/lra.c:2437
0xfcc9c9 do_reload
/gcc/ira.c:5523
0xfcce50 execute
/gcc/ira.c:5709
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

This happens at all optimization levels (but not -O0).

The problem appears to be that the high-part of a register pair is not 
marked as "ever live".  I'm trying to figure out whether this is some 
kind of target-specific issue that has merely been exposed, but it's 
difficult to see what's going on. I'm pretty sure I've never seen this 
one before.


Andrew

[committed] libstdc++: Add __maybe_const_t and __maybe_empty_t aliases

2020-02-26 Thread Jonathan Wakely

This introduces a couple of convenience alias templates to be used for
some repeated patterns using std::conditional_t.

* include/std/ranges (__detail::__maybe_empty_t): Define new helper
alias.
(__detail::__maybe_const_t): Likewise.
(__adaptor::_RangeAdaptor): Use __maybe_empty_t.
(transform_view, take_view, take_while_view, elements_view): Use
__maybe_const_t.
(join_view, split_view): Use both.

Tested powerpc64l-linux, committed to master.

commit 8017d95c7f55b98bcee1caf0216fdfd7fd613849
Author: Jonathan Wakely 
Date:   Wed Feb 26 15:19:43 2020 +

libstdc++: Add __maybe_const_t and __maybe_empty_t aliases

This introduces a couple of convenience alias templates to be used for
some repeated patterns using std::conditional_t.

* include/std/ranges (__detail::__maybe_empty_t): Define new helper
alias.
(__detail::__maybe_const_t): Likewise.
(__adaptor::_RangeAdaptor): Use __maybe_empty_t.
(transform_view, take_view, take_while_view, elements_view): Use
__maybe_const_t.
(join_view, split_view): Use both.

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index a7f4da957ef..d8326632166 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -1029,6 +1029,13 @@ namespace views
 namespace __detail
 {
   struct _Empty { };
+
+  template
+using __maybe_empty_t = conditional_t<_NonEmpty, _Tp, _Empty>;
+
+  template
+using __maybe_const_t = conditional_t<_Const, const _Tp, _Tp>;
+
 } // namespace __detail
 
 namespace views
@@ -1058,8 +1065,8 @@ namespace views
   {
   protected:
[[no_unique_address]]
- conditional_t,
-   _Callable, __detail::_Empty> _M_callable;
+ __detail::__maybe_empty_t,
+   _Callable> _M_callable;
 
   public:
constexpr
@@ -1552,9 +1559,8 @@ namespace views
struct _Iterator
{
private:
- using _Parent
-   = conditional_t<_Const, const transform_view, transform_view>;
- using _Base = conditional_t<_Const, const _Vp, _Vp>;
+ using _Parent = __detail::__maybe_const_t<_Const, transform_view>;
+ using _Base = __detail::__maybe_const_t<_Const, _Vp>;
 
  static constexpr auto
  _S_iter_concept()
@@ -1760,9 +1766,8 @@ namespace views
struct _Sentinel
{
private:
- using _Parent
-   = conditional_t<_Const, const transform_view, transform_view>;
- using _Base = conditional_t<_Const, const _Vp, _Vp>;
+ using _Parent = __detail::__maybe_const_t<_Const, transform_view>;
+ using _Base = __detail::__maybe_const_t<_Const, _Vp>;
 
  constexpr range_difference_t<_Base>
  __distance_from(const _Iterator<_Const>& __i) const
@@ -1886,7 +1891,7 @@ namespace views
struct _Sentinel
{
private:
- using _Base = conditional_t<_Const, const _Vp, _Vp>;
+ using _Base = __detail::__maybe_const_t<_Const, _Vp>;
  using _CI = counted_iterator>;
 
  sentinel_t<_Base> _M_end = sentinel_t<_Base>();
@@ -2025,7 +2030,7 @@ namespace views
struct _Sentinel
{
private:
- using _Base = conditional_t<_Const, const _Vp, _Vp>;
+ using _Base = __detail::__maybe_const_t<_Const, _Vp>;
 
  sentinel_t<_Base> _M_end = sentinel_t<_Base>();
  const _Pred* _M_pred = nullptr;
@@ -2258,8 +2263,8 @@ namespace views
struct _Iterator
{
private:
- using _Parent = conditional_t<_Const, const join_view, join_view>;
- using _Base = conditional_t<_Const, const _Vp, _Vp>;
+ using _Parent = __detail::__maybe_const_t<_Const, join_view>;
+ using _Base = __detail::__maybe_const_t<_Const, _Vp>;
 
  static constexpr bool _S_ref_is_glvalue
= is_reference_v>;
@@ -2450,8 +2455,8 @@ namespace views
struct _Sentinel
{
private:
- using _Parent = conditional_t<_Const, const join_view, join_view>;
- using _Base = conditional_t<_Const, const _Vp, _Vp>;
+ using _Parent = __detail::__maybe_const_t<_Const, join_view>;
+ using _Base = __detail::__maybe_const_t<_Const, _Vp>;
 
  constexpr bool
  __equal(const _Iterator<_Const>& __i) const
@@ -2482,8 +2487,8 @@ namespace views
 
   // XXX: _M_inner is "present only when !is_reference_v<_InnerRange>"
   [[no_unique_address]]
-   conditional_t,
- views::all_t<_InnerRange>, __detail::_Empty> _M_inner;
+   __detail::__maybe_empty_t,
+ views::all_t<_InnerRange>> _M_inner;
 
 public:
   join_view() = default;
@@ -2585,8 +2590,8 @@ namespace views
struct _OuterIter
{
private:
- using _Parent = condit

[committed] libstdc++: Two simplifications for lexicographical_compare

2020-02-26 Thread Jonathan Wakely

* include/bits/ranges_algo.h (__lexicographical_compare_fn): Declare
variables in smaller scope and avoid calling ranges::distance when we
know they are pointers. Remove statically-unreachable use of
__builtin_unreachable().
* include/bits/stl_algobase.h (__lexicographical_compare::__lc):
Define inline.

Tested powerpc64le-linux, committed to master.


commit 113f0a639dbdd78048373a253ec64145ead7d29d
Author: Jonathan Wakely 
Date:   Wed Feb 26 15:19:44 2020 +

libstdc++ Two simplifications for lexicographical_compare

* include/bits/ranges_algo.h (__lexicographical_compare_fn): Declare
variables in smaller scope and avoid calling ranges::distance when 
we
know they are pointers. Remove statically-unreachable use of
__builtin_unreachable().
* include/bits/stl_algobase.h (__lexicographical_compare::__lc):
Define inline.

diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
b/libstdc++-v3/include/bits/ranges_algo.h
index 7d7dbf04103..05c0851d411 100644
--- a/libstdc++-v3/include/bits/ranges_algo.h
+++ b/libstdc++-v3/include/bits/ranges_algo.h
@@ -3464,9 +3464,6 @@ namespace ranges
 && sized_sentinel_for<_Sent2, _Iter2>);
if constexpr (__sized_iters)
  {
-   auto __d1 = ranges::distance(__first1, __last1);
-   auto __d2 = ranges::distance(__first2, __last2);
-
using _ValueType1 = iter_value_t<_Iter1>;
using _ValueType2 = iter_value_t<_Iter2>;
constexpr bool __use_memcmp
@@ -3480,6 +3477,9 @@ namespace ranges
 && is_same_v<_Proj2, identity>);
if constexpr (__use_memcmp)
  {
+   const auto __d1 = __last1 - __first1;
+   const auto __d2 = __last2 - __first2;
+
if (const auto __len = std::min(__d1, __d2))
  {
const auto __c
@@ -3498,10 +3498,8 @@ namespace ranges
if (__c < 0)
  return false;
  }
-   else
- __builtin_unreachable();
  }
-   return (__last1 - __first1 < __last2 - __first2);
+   return __d1 < __d2;
  }
  }
 
diff --git a/libstdc++-v3/include/bits/stl_algobase.h 
b/libstdc++-v3/include/bits/stl_algobase.h
index 268569336b0..e4f7fa417ed 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -1254,21 +1254,16 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
 {
   template
_GLIBCXX20_CONSTEXPR
-   static bool __lc(_II1, _II1, _II2, _II2);
+   static bool
+   __lc(_II1 __first1, _II1 __last1, _II2 __first2, _II2 __last2)
+   {
+ using __gnu_cxx::__ops::__iter_less_iter;
+ return std::__lexicographical_compare_impl(__first1, __last1,
+__first2, __last2,
+__iter_less_iter());
+   }
 };
 
-  template
-template
-  _GLIBCXX20_CONSTEXPR
-  bool
-  __lexicographical_compare<_BoolType>::
-  __lc(_II1 __first1, _II1 __last1, _II2 __first2, _II2 __last2)
-  {
-   return std::__lexicographical_compare_impl(__first1, __last1,
-  __first2, __last2,
-   __gnu_cxx::__ops::__iter_less_iter());
-  }
-
   template<>
 struct __lexicographical_compare
 {

Re: [PATCH] libstdc++: P1645R1 constexpr for algorithms

2020-02-26 Thread Patrick Palka

On Wed, 26 Feb 2020, Jonathan Wakely wrote:

> On 25/02/20 15:36 -0500, Patrick Palka wrote:
> > This adds constexpr to 11 algorithms defined in  as per P1645R1.
> > 
> > Tested on x86_64-pc-linux-gnu, OK to commit?
> > 
> > libstdc++-v3/ChangeLog:
> > 
> > P1645R1 constexpr for  algorithms
> > * include/bits/stl_numeric.h (iota, accumulate, inner_product,
> > partial_sum, adjacent_difference): Make conditionally constexpr for
> > C++20.
> > * include/std/numeric (reduce, transform_reduce, exclusive_scan,
> > inclusive_scan, transform_exclusive_scan, transform_inclusive_scan):
> > Likewise.  Define the feature test macro __cpp_lib_constexpr_numeric.
> > * testsuite/26_numerics/accumulate/constexpr.cc: New test.
> > * testsuite/26_numerics/adjacent_difference/constexpr.cc: Likewise.
> > * testsuite/26_numerics/exclusive_scan/constexpr.cc: Likewise.
> > * testsuite/26_numerics/inclusive_scan/constexpr.cc: Likewise.
> > * testsuite/26_numerics/inner_product/constexpr.cc: Likewise.
> > * testsuite/26_numerics/iota/constexpr.cc: Likewise.
> > * testsuite/26_numerics/partial_sum/constexpr.cc: Likewise.
> > * testsuite/26_numerics/reduce/constexpr.cc: Likewise.
> > * testsuite/26_numerics/transform_exclusive_scan/constexpr.cc:
> > Likewise.
> > * testsuite/26_numerics/transform_inclusive_scan/constexpr.cc:
> > Likewise.
> > * testsuite/26_numerics/transform_reduce/constexpr.cc: Likewise.
> > ---
> > libstdc++-v3/include/bits/stl_numeric.h   |  9 +++
> > libstdc++-v3/include/std/numeric  | 18 ++
> > .../26_numerics/accumulate/constexpr.cc   | 42 ++
> > .../adjacent_difference/constexpr.cc  | 44 +++
> > .../26_numerics/exclusive_scan/constexpr.cc   | 44 +++
> > .../26_numerics/inclusive_scan/constexpr.cc   | 55 +++
> > .../26_numerics/inner_product/constexpr.cc| 45 +++
> > .../testsuite/26_numerics/iota/constexpr.cc   | 31 +++
> > .../26_numerics/partial_sum/constexpr.cc  | 44 +++
> > .../testsuite/26_numerics/reduce/constexpr.cc | 52 ++
> > .../transform_exclusive_scan/constexpr.cc | 33 +++
> > .../transform_inclusive_scan/constexpr.cc | 44 +++
> > .../26_numerics/transform_reduce/constexpr.cc | 55 +++
> > 13 files changed, 516 insertions(+)
> > create mode 100644
> > libstdc++-v3/testsuite/26_numerics/accumulate/constexpr.cc
> > create mode 100644
> > libstdc++-v3/testsuite/26_numerics/adjacent_difference/constexpr.cc
> > create mode 100644
> > libstdc++-v3/testsuite/26_numerics/exclusive_scan/constexpr.cc
> > create mode 100644
> > libstdc++-v3/testsuite/26_numerics/inclusive_scan/constexpr.cc
> > create mode 100644
> > libstdc++-v3/testsuite/26_numerics/inner_product/constexpr.cc
> > create mode 100644 libstdc++-v3/testsuite/26_numerics/iota/constexpr.cc
> > create mode 100644
> > libstdc++-v3/testsuite/26_numerics/partial_sum/constexpr.cc
> > create mode 100644 libstdc++-v3/testsuite/26_numerics/reduce/constexpr.cc
> > create mode 100644
> > libstdc++-v3/testsuite/26_numerics/transform_exclusive_scan/constexpr.cc
> > create mode 100644
> > libstdc++-v3/testsuite/26_numerics/transform_inclusive_scan/constexpr.cc
> > create mode 100644
> > libstdc++-v3/testsuite/26_numerics/transform_reduce/constexpr.cc
> > 
> > diff --git a/libstdc++-v3/include/bits/stl_numeric.h
> > b/libstdc++-v3/include/bits/stl_numeric.h
> > index dd4683fe92e..f95c86a0d48 100644
> > --- a/libstdc++-v3/include/bits/stl_numeric.h
> > +++ b/libstdc++-v3/include/bits/stl_numeric.h
> > @@ -83,6 +83,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >*  @ingroup numeric_ops
> >*/
> >   template
> > +_GLIBCXX20_CONSTEXPR
> > void
> > iota(_ForwardIterator __first, _ForwardIterator __last, _Tp __value)
> > {
> > @@ -128,6 +129,7 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
> >*  @return  The final sum.
> >*/
> >   template
> > +_GLIBCXX20_CONSTEXPR
> > inline _Tp
> > accumulate(_InputIterator __first, _InputIterator __last, _Tp __init)
> > {
> > @@ -154,6 +156,7 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
> >*  @return  The final sum.
> >*/
> >   template
> > +_GLIBCXX20_CONSTEXPR
> > inline _Tp
> > accumulate(_InputIterator __first, _InputIterator __last, _Tp __init,
> >_BinaryOperation __binary_op)
> > @@ -182,6 +185,7 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
> >*  @return  The final inner product.
> >*/
> >   template
> > +_GLIBCXX20_CONSTEXPR
> > inline _Tp
> > inner_product(_InputIterator1 __first1, _InputIterator1 __last1,
> >   _InputIterator2 __first2, _Tp __init)
> > @@ -214,6 +218,7 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
> >*/
> >   template >typename _BinaryOperation1, typename _BinaryOperation2>
> > +_GLIBCXX20_CONSTEXPR
> > inline _Tp
> > inner_product(_InputIterator1 __first1, _InputIterator1 __la

Re: middle-end: Fix wrong code caused by disagreemed between FRE and access path oracle [PR 92152]

2020-02-26 Thread Jan Hubicka

Hi,
this is and TBAA stat for building cc1 with -flto-partition=none.

From:

Alias oracle query stats:
  refs_may_alias_p: 46099243 disambiguations, 55677716 queries
  ref_maybe_used_by_call_p: 124351 disambiguations, 46883813 queries
  call_may_clobber_ref_p: 12673 disambiguations, 17133 queries
  nonoverlapping_component_refs_p: 0 disambiguations, 3803 queries
  nonoverlapping_refs_since_match_p: 19034 disambiguations, 46849 must 
overlaps, 67934 queries
  aliasing_component_refs_p: 76737 disambiguations, 300234 queries
  TBAA oracle: 15816119 disambiguations 39888339 queries
   12364426 are in alias set 0
   7655945 queries asked about the same object
   178 queries asked about the same alias set
   0 access volatile
   2963837 are dependent in the DAG
   1087834 are aritificially in conflict with void *

PTA query stats:
  pt_solution_includes: 904096 disambiguations, 9062937 queries
  pt_solutions_intersect: 853990 disambiguations, 10098128 queries

to:

Alias oracle query stats:
  refs_may_alias_p: 48168904 disambiguations, 57845554 queries
  ref_maybe_used_by_call_p: 124062 disambiguations, 48953042 queries
  call_may_clobber_ref_p: 12658 disambiguations, 17137 queries
  nonoverlapping_component_refs_p: 0 disambiguations, 3312 queries
  nonoverlapping_refs_since_match_p: 18997 disambiguations, 45778 must 
overlaps, 67109 queries
  aliasing_component_refs_p: 58756 disambiguations, 296126 queries
  TBAA oracle: 16036749 disambiguations 40132907 queries
   12352609 are in alias set 0
   7697466 queries asked about the same object
   178 queries asked about the same alias set
   0 access volatile
   2964615 are dependent in the DAG
   1081290 are aritificially in conflict with void *

PTA query stats:
  pt_solution_includes: 826579 disambiguations, 8987330 queries
  pt_solutions_intersect: 841758 disambiguations, 10078495 queries

So aliasing_component_refs_p drops from 25% disambiguation rate to 19% which
is quite noticeable. I will run SPEC benchmarks.
Firefox is not very releavnt here since it is -fno-strict-aliasing.

Honza

Re: [PATCH] Add c++2a binary_semaphore

2020-02-26 Thread Jonathan Wakely


On 24/02/20 21:53 -0500, Thomas Rodgers wrote:

+bool
+_S_futex_wait_until(int* __addr, int __val,
+   bool __has_timeout = false,
+   std::chrono::seconds __s = std::chrono::seconds::zero(),
+   std::chrono::nanoseconds __ns = 
std::chrono::nanoseconds::zero())
+{
+  if (!__has_timeout)
+   {
+ syscall (SYS_futex, __addr, 0, __val, nullptr);
+ // Can't do anything about an error here except abort, so ignore it.
+   }
+  else
+   {
+ struct timeval __tv;
+ gettimeofday(&__tv, NULL);
+ struct timespec __rt;
+ __rt.tv_sec = __s.count() - __tv.tv_sec;
+ __rt.tv_nsec = __ns.count() - __tv.tv_usec * 1000;
+ if (__rt.tv_nsec < 0)
+   {
+ __rt.tv_nsec += 10;
+ --__rt.tv_sec;
+   }
+ if (__rt.tv_sec < 0)
+   return false;
+
+ if (syscall (SYS_futex, __addr, 0, __val, &__rt) == -1)


This syscall has the same problem as https://gcc.gnu.org/PR93421 so we
should avoid introducing a new instance of the bug.

Re: [PATCH] libstdc++: Memoize {drop,drop_while,filter,reverse}_view::begin

2020-02-26 Thread Patrick Palka

On Tue, 11 Feb 2020, Patrick Palka wrote:

> This patch adds memoization for these four views so that their begin() has the
> required constant time amortized complexity.
> 
> In the general case we use std::optional to cache the result.  When the
> underlying range is a random_access_range then we store the cache as an offset
> from the beginning of the range, which should be more compact.  And when the
> underlying iterator is not copyable, then we completely disable the cache.
> 
> Using std::optional in the cache is not ideal though because it means that the
> cache can't be utilized during constexpr evaluation.  If instead of
> std::optional we store a separate flag to denote an empty cache then we'll be
> able to use the cache during constexpr evaluation at the cost of a extra byte 
> or
> so.  I am not sure which design to settle on.

Here's v2 of this patch which uses the new helper
__detail::__maybe_empty_t and provides a more descriptive commit
message.  It also refines the constraints on the partial specializations
of _CachedPosition.

-- >8 --

Subject: [PATCH] libstdc++: Memoize
 {drop,drop_while,filter,reverse}_view::begin

This patch adds memoization for these four views so that their begin() has the
required amortized constant time complexity.

In the general case we use an std::optional> to cache the
iterator to the first element of the view.  But when the underlying range models
random_access_range and when sizeof(range_difference_t<_Range>) is no larger
than sizeof(iterator_t<_Range>), then we instead cache the offset to the first
element.  Otherwise, when the underlying range does not model forward_range,
then we completely disable the cache.

Additionally, in drop_view and reverse_view, we disable the cache when the
underlying range models random_access_range, because in these cases recomputing
begin() takes O(1) time anyway.

Using std::optional to cache the iterator in the general case is not ideal
though because it means that the cache can't be utilized during constexpr
evaluation.  If instead of std::optional we store an (iterator, bool) pair where
the bool denotes whether the cache is empty, then we'll be able to use the cache
during constexpr evaluation at the cost of a extra byte or so in the size of the
corresponding view structure.  I'm not sure which trade-off to choose.

libstdc++-v3/ChangeLog:

* include/std/ranges (__detail::_CachedPosition): New struct.
(views::filter_view::_S_needs_cached_begin): New member variable.
(views::filter_view::_M_cached_begin): New member variable.
(views::filter_view::begin): Use _M_cached_begin to cache its
result.
(views::drop_view::_S_needs_cached_begin): New static member variable.
(views::drop_view::_M_cached_begin): New member variable.
(views::drop_view::begin): Use _M_cached_begin to cache its result
when _S_needs_cached_begin.
(views::drop_while_view::_M_cached_begin): New member variable.
(views::drop_while_view::begin): Use _M_cached_begin to cache its
result.
(views::reverse_view::_S_needs_cached_begin): New static member
variable.
(views::reverse_view::_M_cached_begin): New member variable.
(views::reverse_view::begin): Use _M_cached_begin to cache its result
when _S_needs_cached_begin.
* testsuite/std/ranges/adaptors/drop.cc: Augment test to check that
drop_view::begin caches its result.
* testsuite/std/ranges/adaptors/drop_while.cc: Augment test to check
that drop_while_view::begin caches its result.
* testsuite/std/ranges/adaptors/filter.cc: Augment test to check that
filter_view::begin caches its result.
* testsuite/std/ranges/adaptors/reverse.cc: Augment test to check that
reverse_view::begin caches its result.
---
 libstdc++-v3/include/std/ranges   | 136 --
 .../testsuite/std/ranges/adaptors/drop.cc |  57 
 .../std/ranges/adaptors/drop_while.cc |  38 -
 .../testsuite/std/ranges/adaptors/filter.cc   |  36 +
 .../testsuite/std/ranges/adaptors/reverse.cc  |  56 
 5 files changed, 308 insertions(+), 15 deletions(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 440487b15cb..e7927fb0436 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -1335,6 +1335,85 @@ namespace views
   }
   } // namespace __detail

+  namespace __detail
+  {
+template
+  struct _CachedPosition
+  {
+  private:
+   std::optional> _M_iter;
+
+  public:
+   constexpr bool
+   _M_has_value() const
+   { return _M_iter.has_value(); }
+
+   constexpr iterator_t<_Range>
+   _M_get(const _Range&) const
+   {
+ __glibcxx_assert(_M_has_value());
+ return *_M_iter;
+   }
+
+   constexpr void
+   _M_set(const _Range&, const iterator_t<_Range>& __it)
+

Re: GLIBC libmvec status

2020-02-26 Thread Segher Boessenkool

Hi!

On Tue, Feb 25, 2020 at 07:43:09PM -0600, Bill Schmidt wrote:
> On 2/25/20 12:45 PM, Segher Boessenkool wrote:
> >I don't agree we should have a new ABI, and an API (which this *is* as
> >far as I can tell) works fine on *any* ABI.  Homogeneous aggregates has
> >nothing to do with anything either.
> >
> >It is fine to only *support* powerpc64le-linux, sure.  But don't fragment
> >the implementation, it only hurts, never helps -- we will end up having
> >to support ten or twenty different compilers, instead of one compiler
> >with a few (mostly) orthogonal variations.  And yes, we should also test
> >everything everywhere, whenever reasonable.
> 
> Thanks, Segher.  Let me ask for some clarification here on how you'd like
> us to proceed.
> 
> The reason that homogeneous aggregates matter (at least somewhat) is that
> the ABI ^H^H^H^HAPI requires establishing a calling convention and a name-
> mangling formula that includes the length of parameters and return values.

I don't see how that matters?  A function that returns a struct works
fine on any implementation.  Sure, for small structs it can be returned
in just registers on ELFv2, while some other ABIs push stuff through
memory.  But that is just an implementation detail of the *actual* ABI.
It is good to know about this when designing the mvec stuff, sure, but
this will *work* on *any* ABI.

> Since ELFv2 and ELFv1 do not have the same calling convention, and ELFv2
> has a superior one, we chose to use ELFv2's calling convention and make use
> of homogeneous aggregates for return values in registers for the case of
> vectorized sincos.

I don't understand this.  You designed this API with the ELFv2 ABI in
mind, sure, but that does not magically make homogeneous aggreggates
appear on other ABIs, nor do you need them there.

I'll read the docs again, someone is missing something here, and it
probably is me ;-)

Segher

Re: [PATCH] rs6000: Fix broken gcc.target/powerpc/fold-vec-st-*.c test cases [PR93913]

2020-02-26 Thread Peter Bergner

On 2/25/20 4:25 PM, Segher Boessenkool wrote:
> Well, you now get an extra mask instruction (rldicr for example) as well,
> right?  While that mask usually isn't needed.

I believe the mask is implicit by the pattern used by the vec_st()
builtin, which normally gets mapped to the Altivec stvx insn, so
when we use the d-form stxv insn, we have to make explicit that
and:.

I still think using vec_st() or any of the other vector store
builtins is less preferable than just dereferencing a pointer.  

>> -/* { dg-final { scan-assembler-times {\mstvx\M} 14 } } */
>> +/* { dg-final { scan-assembler-times {\mstvx|stxv\M} 14 } } */
> 
> That checks if either the string
>   \mstvx
> or the string
>   stxv\M

Oops, good catch.

> You want
>   {\m(stvx|stxv)\M}

Is the patch ok with this variant?  I think this is more readable to
me than the other variants you gave...at least to me. :-)

Peter

Re: [PATCH] rs6000: Fix broken gcc.target/powerpc/fold-vec-st-*.c test cases [PR93913]

2020-02-26 Thread Peter Bergner

On 2/26/20 11:29 AM, Peter Bergner wrote:
>> You want
>>   {\m(stvx|stxv)\M}

As we discussed offline, the regex above double counts everything,
so I went with {\m(?:stvx|stxv|stxvx)\M} which you pointed me to.
Pushed to master.  I'll push to the gcc-9 branch tomorrow after
Bill's regression scripts confirm it's fixed on master.

Thanks!

Peter

Re: [committed] libstdc++: Add __maybe_const_t and __maybe_empty_t aliases

2020-02-26 Thread Daniel Krügler

Am Mi., 26. Feb. 2020 um 16:20 Uhr schrieb Jonathan Wakely :
>
> This introduces a couple of convenience alias templates to be used for
> some repeated patterns using std::conditional_t.

I find it a bit confusing/inconsistent to define __maybe_const_t such
that the bool argument says "is const", while for __maybe_empty_t the
corresponding bool argument says "is not empty". Suggestion: Implement
__maybe_empty_t such that its bool argument says "is empty" or change
the alias to __maybe_nonempty_t.

Thanks,

- Daniel

Re: [PATCH v2] debug/93751 Option to generate DIEs for external variables - ping

2020-02-26 Thread Alexey Neyman


Patch ping.

On 2/19/20 3:30 PM, Alexey Neyman wrote:

Hi all,

Attached is a patch adjusted per discussion in 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93751


- The manual is corrected to reflect that DIEs for external variables 
are not generated when using DWARF
- A new option is introduced that implements the behavior that was 
described in the manual (and is what other debugging formats do).


Please review/apply.

Regards,
Alexey.

>From 7a1b7d99a98416ed8b1c8bf2e4877655fd820fd0 Mon Sep 17 00:00:00 2001
From: Alexey Neyman 
Date: Thu, 13 Feb 2020 22:01:10 -0800
Subject: [PATCH] debug/93751 Option to create DIEs for ext. vars

-g1 is described in the manual to generate debug info for functions and
external variables. It does that for older debugging formats but not for
DWARF. This change fixes the manual to describe the current behavior and
introduces a new option, -gexternal-variables, that makes -g1 output the
DIEs for external variables, as it does for older debugging formats.

2020-02-14  Alexey Neyman  

PR debug/93751
* gcc/common.opt: New option, -gexternal-variables.
* gcc/doc/invoke.texi: Describe the new option.
* dwarf2out.c (gen_decl_die): Proceed to generating the DIE if
the debug level is terse, DIEs for external variables have been
requested and the declaration is public.
(dwarf2out_decl): Same.
* gcc/testsuite/gcc.dg/debug/dwarf2/pr93751-1: New test.
* gcc/testsuite/gcc.dg/debug/dwarf2/pr93751-2: New test.
* gcc/testsuite/gcc.dg/debug/dwarf2/pr93751-3: New test.

Signed-off-by: Alexey Neyman 
---
 gcc/common.opt|  4 
 gcc/doc/invoke.texi   | 13 +++--
 gcc/dwarf2out.c   | 14 ++
 gcc/testsuite/gcc.dg/debug/dwarf2/pr93751-1.c |  7 +++
 gcc/testsuite/gcc.dg/debug/dwarf2/pr93751-2.c |  7 +++
 gcc/testsuite/gcc.dg/debug/dwarf2/pr93751-3.c |  7 +++
 6 files changed, 46 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/pr93751-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/pr93751-2.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/pr93751-3.c

diff --git a/gcc/common.opt b/gcc/common.opt
index 5692cd04374..aec0aacb116 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3075,6 +3075,10 @@ gdwarf-
 Common Driver Joined UInteger Var(dwarf_version) Init(4) Negative(gstabs)
 Generate debug information in DWARF v2 (or later) format.
 
+gexternal-variables
+Common Driver RejectNegative Var(debug_external_variables)
+Generate debug information for external variables.
+
 ggdb
 Common Driver JoinedOrMissing
 Generate debug information in default extended format.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index bd9ecebf103..d09cf298c36 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -443,7 +443,8 @@ Objective-C and Objective-C++ Dialects}.
 @item Debugging Options
 @xref{Debugging Options,,Options for Debugging Your Program}.
 @gccoptlist{-g  -g@var{level}  -gdwarf  -gdwarf-@var{version} @gol
--ggdb  -grecord-gcc-switches  -gno-record-gcc-switches @gol
+-gexternal-variables  -ggdb  @gol
+-grecord-gcc-switches  -gno-record-gcc-switches @gol
 -gstabs  -gstabs+  -gstrict-dwarf  -gno-strict-dwarf @gol
 -gas-loc-support  -gno-as-loc-support @gol
 -gas-locview-support  -gno-as-locview-support @gol
@@ -8649,7 +8650,10 @@ Level 0 produces no debug information at all.  Thus, 
@option{-g0} negates
 Level 1 produces minimal information, enough for making backtraces in
 parts of the program that you don't plan to debug.  This includes
 descriptions of functions and external variables, and line number
-tables, but no information about local variables.
+tables, but no information about local variables.  For historical
+reasons, the descriptions of external variables are not generated
+when producing the debugging information in DWARF format; these
+descriptions can be requested using @option{-gexternal-variables}.
 
 Level 3 includes extra information, such as all the macro definitions
 present in the program.  Some debuggers support macro expansion when
@@ -8663,6 +8667,11 @@ confusion with @option{-gdwarf-@var{level}}.
 Instead use an additional @option{-g@var{level}} option to change the
 debug level for DWARF.
 
+@item -gexternal-variables
+@opindex gexternal-variables
+When using level 1 of the debugging information in DWARF format,
+also produce the descriptions of the external variables.
+
 @item -fno-eliminate-unused-debug-symbols
 @opindex feliminate-unused-debug-symbols
 @opindex fno-eliminate-unused-debug-symbols
diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index fe46c7e1eee..13c62ad1eec 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -26354,8 +26354,11 @@ gen_decl_die (tree decl, tree origin, struct 
vlr_context *ctx,
 case VAR_DECL:
 case RESULT_DECL:
   /* If we are in terse mode, don't generate any DIEs t

Re: [PATCH v2] debug/93751 Option to generate DIEs for external variables - ping

2020-02-26 Thread Jason Merrill


On 2/26/20 2:02 PM, Alexey Neyman wrote:

Patch ping.

On 2/19/20 3:30 PM, Alexey Neyman wrote:

Hi all,

Attached is a patch adjusted per discussion in 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93751


- The manual is corrected to reflect that DIEs for external variables 
are not generated when using DWARF
- A new option is introduced that implements the behavior that was 
described in the manual (and is what other debugging formats do).


Don't we want to fix the DWARF behavior to match the documentation?

Jason

Re: [PATCH v2] debug/93751 Option to generate DIEs for external variables - ping

2020-02-26 Thread Alexander Monakov

On Wed, 26 Feb 2020, Jason Merrill wrote:

> Don't we want to fix the DWARF behavior to match the documentation?

+1 - I think Alexey correctly pointed out in the Bugzilla that debuginfo
growth from this change would be minimal (usually the number of global
variables is very small compared to number of functions (and linemap info).

Alexander

Re: [PATCH v2] debug/93751 Option to generate DIEs for external variables - ping

2020-02-26 Thread Richard Biener

On February 26, 2020 8:26:06 PM GMT+01:00, Alexander Monakov 
 wrote:
>On Wed, 26 Feb 2020, Jason Merrill wrote:
>
>> Don't we want to fix the DWARF behavior to match the documentation?
>
>+1 - I think Alexey correctly pointed out in the Bugzilla that
>debuginfo
>growth from this change would be minimal (usually the number of global
>variables is very small compared to number of functions (and linemap
>info).

+1 from me as well

Richard. 

>Alexander

libgo: update to final Go1.14 release

2020-02-26 Thread Ian Lance Taylor

This patch updates libgo to the final Go1.14 release.  Bootstrapped
and ran Go testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
0f9e2b205ccb34be93aafa2669c6c9b5a8c557ec
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 7dcb9ecb4fd..a62c8292e0a 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-658fe4f48374167bf8688e8dbd5f85eee155749e
+5fc21bb0d91d916940c21e6d4a3e10ad3f45343d
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/MERGE b/libgo/MERGE
index 2d01b8759bf..233492ada69 100644
--- a/libgo/MERGE
+++ b/libgo/MERGE
@@ -1,4 +1,4 @@
-a068054af141c01df5a4519844f4b77273605f4e
+20a838ab94178c55bc4dc23ddc332fce8545a493
 
 The first line of this file holds the git revision number of the
 last merge done from the master library sources.
diff --git a/libgo/VERSION b/libgo/VERSION
index 5304c505acd..f000507a7fe 100644
--- a/libgo/VERSION
+++ b/libgo/VERSION
@@ -1 +1 @@
-go1.14rc1
+go1.14
diff --git a/libgo/go/cmd/go/internal/modcmd/mod.go 
b/libgo/go/cmd/go/internal/modcmd/mod.go
index 17505221587..d72d0cacd68 100644
--- a/libgo/go/cmd/go/internal/modcmd/mod.go
+++ b/libgo/go/cmd/go/internal/modcmd/mod.go
@@ -7,7 +7,6 @@ package modcmd
 
 import (
"cmd/go/internal/base"
-   "cmd/go/internal/cfg"
 )
 
 var CmdMod = &base.Command{
@@ -32,7 +31,3 @@ See 'go help modules' for an overview of module functionality.
cmdWhy,
},
 }
-
-func addModFlags(cmd *base.Command) {
-   cmd.Flag.StringVar(&cfg.ModFile, "modfile", "", "")
-}
diff --git a/libgo/go/cmd/go/internal/modload/import.go 
b/libgo/go/cmd/go/internal/modload/import.go
index 5906d648b49..d7fca8fd2c5 100644
--- a/libgo/go/cmd/go/internal/modload/import.go
+++ b/libgo/go/cmd/go/internal/modload/import.go
@@ -184,8 +184,9 @@ func Import(path string) (m module.Version, dir string, err 
error) {
if !pathIsStd {
if cfg.BuildModReason == "" {
queryErr = fmt.Errorf("import lookup disabled 
by -mod=%s", cfg.BuildMod)
+   } else {
+   queryErr = fmt.Errorf("import lookup disabled 
by -mod=%s\n\t(%s)", cfg.BuildMod, cfg.BuildModReason)
}
-   queryErr = fmt.Errorf("import lookup disabled by 
-mod=%s\n\t(%s)", cfg.BuildMod, cfg.BuildModReason)
}
return module.Version{}, "", &ImportMissingError{Path: path, 
QueryErr: queryErr}
}
diff --git a/libgo/go/cmd/go/internal/vet/vet.go 
b/libgo/go/cmd/go/internal/vet/vet.go
index 660a739fbbd..4e09c0fb9c9 100644
--- a/libgo/go/cmd/go/internal/vet/vet.go
+++ b/libgo/go/cmd/go/internal/vet/vet.go
@@ -51,7 +51,9 @@ func runVet(cmd *base.Command, args []string) {
 
work.BuildInit()
work.VetFlags = vetFlags
-   work.VetExplicit = true
+   if len(vetFlags) > 0 {
+   work.VetExplicit = true
+   }
if vetTool != "" {
var err error
work.VetTool, err = filepath.Abs(vetTool)
diff --git a/libgo/go/cmd/go/internal/web/api.go 
b/libgo/go/cmd/go/internal/web/api.go
index ad99eb2f8c3..209ed6861a1 100644
--- a/libgo/go/cmd/go/internal/web/api.go
+++ b/libgo/go/cmd/go/internal/web/api.go
@@ -3,7 +3,7 @@
 // license that can be found in the LICENSE file.
 
 // Package web defines minimal helper routines for accessing HTTP/HTTPS
-// resources without requiring external dependenicies on the net package.
+// resources without requiring external dependencies on the net package.
 //
 // If the cmd_go_bootstrap build tag is present, web avoids the use of the net
 // package and returns errors for all network operations.
diff --git a/libgo/go/cmd/go/testdata/script/mod_gobuild_import.txt 
b/libgo/go/cmd/go/testdata/script/mod_gobuild_import.txt
index ae05250c5f6..948496241e9 100644
--- a/libgo/go/cmd/go/testdata/script/mod_gobuild_import.txt
+++ b/libgo/go/cmd/go/testdata/script/mod_gobuild_import.txt
@@ -2,49 +2,67 @@
 
 # go/build's Import should find modules by invoking the go command
 
-go build -o $WORK/testimport.exe ./testimport
+go build -o $WORK ./testimport ./testfindonly
 
 # GO111MODULE=off
 env GO111MODULE=off
-! exec $WORK/testimport.exe gobuild.example.com/x/y/z/w .
+! exec $WORK/testimport$GOEXE gobuild.example.com/x/y/z/w .
 
 # GO111MODULE=auto in GOPATH/src
 env GO111MODULE=auto
-exec $WORK/testimport.exe gobuild.example.com/x/y/z/w .
+exec $WORK/testimport$GOEXE gobuild.example.com/x/y/z/w .
 
 # GO111MODULE=auto outside GOPATH/src
 cd $GOPATH/other
 env GO111MODULE=auto
-exec $WORK/testimport.exe other/x/y/z/w .
+exec $WORK/testimport$GOEXE other/x/y/z/w .
 stdout w2.go
 
-! exec $WORK/testimport.exe gobuild.example.com/x/y/z/w .
+! exec $WORK/testimport$GOEXE gobuild.example.com/x/y/z/w .
 stderr 'cannot find module providing package gobuild.example.com/x/y/z/w'
 
 cd z
-exec

[PATCH] c++: Fix ICE with invalid array bounds [PR93789]

2020-02-26 Thread Marek Polacek

r7-2111 introduced maybe_constant_value in cp_fully_fold.
maybe_constant_value uses cxx_eval_outermost_constant_expr, which
can clear TREE_CONSTANT:
6510   else if (non_constant_p && TREE_CONSTANT (r))
[...]
6529   TREE_CONSTANT (r) = false;

In this test the array size is '(long int) "h"'.  This used to be
TREE_CONSTANT but given the change above, the flag will be cleared
when we cp_fully_fold the array size in compute_array_index_type_loc.
That means we don't emit an error in the
10391   else if (TREE_CONSTANT (size)
block in the same function, and we go on.  Then we compute ITYPE
using cp_build_binary_op and use maybe_constant_value on it and
suddenly we have something that's TREE_CONSTANT again.  And then we
crash in reshape_init_array_1 in tree_to_uhwi, because what we have
doesn't fit in an unsigned HWI.

icc accepts this code, but since we used to reject it, I see no desire
to make this work, so I've added a check that triggers when we end up
with a constant that is not an INTEGER_CST.

Bootstrapped/regtested on x86_64-linux, ok for trunk?  Probably no need
to change released versions.

2020-02-26  Marek Polacek  

PR c++/93789 - ICE with invalid array bounds.
* decl.c (compute_array_index_type_loc): Give an error
when ITYPE ends up being a constant that is not an INTEGER_CST.

* g++.dg/ext/vla22.C: New test.
---
 gcc/cp/decl.c|  7 +++
 gcc/testsuite/g++.dg/ext/vla22.C | 10 ++
 2 files changed, 17 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/vla22.C

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 1947c4ddb7f..4ac29d99d04 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -10473,6 +10473,13 @@ compute_array_index_type_loc (location_t name_loc, 
tree name, tree size,
  error ("overflow in array dimension");
  TREE_OVERFLOW (itype) = 0;
}
+  else if (TREE_CODE (itype) != INTEGER_CST)
+   {
+ if (!(complain & tf_error))
+   return error_mark_node;
+ invalid_array_size_error (loc, cst_size_not_constant, itype, name);
+ itype = integer_one_node;
+   }
 }
 
   /* Create and return the appropriate index type.  */
diff --git a/gcc/testsuite/g++.dg/ext/vla22.C b/gcc/testsuite/g++.dg/ext/vla22.C
new file mode 100644
index 000..22e200b49cf
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/vla22.C
@@ -0,0 +1,10 @@
+// PR c++/93789 - ICE with invalid array bounds.
+// { dg-do compile }
+// { dg-options "" }
+
+void
+f ()
+{
+  const int tbl[(long) "h"] = { 12 }; // { dg-error "size of array .tbl. is 
not a constant expression" "" { target c++11 } }
+// { dg-error "size of array .tbl. is not an integral constant-expression" "" 
{ target c++98_only } .-1 }
+}

base-commit: 051b9873e78fe1acb1a3fecd0c6e5685b6c12fb3
-- 
Marek Polacek • Red Hat, Inc. • 300 A St, Boston, MA

Re: work-around system header namespace pollution

2020-02-26 Thread Jeff Law

On Tue, 2020-02-25 at 17:45 -0300, Alexandre Oliva wrote:
> Including limits.h on vxworks pollutes the global namespace with SH
> and many other platform names; work around it.
> 
> Tested on the affected platform.  Ok to install?
> 
> 
> for  gcc/testsuite/ChangeLog
> 
>   * gcc.target/aarch64/vect-mull.x (SH): Undefine.
OK
jeff

Re: define NO_DOT_IN_LABEL only in vxworks6

2020-02-26 Thread Jeff Law

On Tue, 2020-02-25 at 17:47 -0300, Alexandre Oliva wrote:
> There was a mistake in forward-porting and contributing some
> vxworks7r2 changes, that caused a conditional to be dropped around a
> couple of preprocessor directives, needed only up to vxworks6, that
> change the compiler's behavior WRT introducing dollars and dots in
> symbol names.
> 
> This deviates GCC's behavior from the native system compiler, in a way
> that appears to have ABI implications, so we'd like to correct that,
> even at this late stage in the development cycle.
> 
> Tested with x86_64-linux-gnu-x-aarch64-vxworks7r2.  Ok to install?
> 
> 
> for  gcc/ChangeLog
> 
>   * config/vx-common.h (NO_DOLLAR_IN_LABEL, NO_DOT_IN_LABEL): Leave
>   them alone on vx7.
OK
jeff
>

Re: structurally compare type_arg_packs [93933]

2020-02-26 Thread Jason Merrill


On 2/25/20 4:09 PM, Nathan Sidwell wrote:
We consider all TYPE_ARGUMENT_PACKS distinct types, leading to problems 
with redeclarations.



This patch fixes things by:
a) marking all such types for structural comparison
b) teaching structural_comptypes how to compare them.

1) It appears that NONTYPE_ARGUMENT_PACKS just work, I think because we 
just compare TREE_OPERANDs, which dtrt.


2) I don't think the ARGUMENT_PACK_INCOMPLETE_P and 
ARGUMENT_PACK_EXPLICIT_ARGS affect the comparison, (icbw?)


I'll apply to trunk shortly, unless someone indicates I'm confused.


I'd think that the bug is that we're treating them as types in the first 
place; they aren't types, so they shouldn't reach comptypes.  I'd lean 
toward adding an assert to that effect and fixing the caller to use e.g. 
template_args_equal.


Jason

Re: [PATCH] use pointer size rather than array size when storing the former (PR 93829)

2020-02-26 Thread Jeff Law

On Wed, 2020-02-19 at 17:26 -0700, Martin Sebor wrote:
> The buffer overflow detection for multi-char stores uses the size
> of a source array even when what's actually being accessed (read
> and stored) is a pointer to the array.  That leads to incorrect
> warnings in some cases.
> 
> The attached patch corrects the function that computes the size of
> the access to set it to that of a pointer instead if the source is
> an address expression.
> 
> Tested on x86_64-linux.

>if (TREE_CODE (exp) == ADDR_EXPR)
> -exp = TREE_OPERAND (exp, 0);
> +{
> +  /* If the size of the access hasn't been determined yet it's that
> +  of a pointer.  */
> +  if (!nbytes)
> + nbytes = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (exp)));
> +  exp = TREE_OPERAND (exp, 0);
> +}
>  
This doesn't make any sense to me.  You're always going to get the size of a
pointer here.  Don't you want the size of the TYPE of the operand?


Jeff

GCC 8.4 Status Report (2020-02-26)

2020-02-26 Thread Jakub Jelinek

Status
==

The GCC 8 branch is now frozen for blocking regressions and documentation
fixes only, all changes to the branch require a RM approval now.


Quality Data


Priority  #   Change from last report
---   ---
P10   -   2
P2  258   -  26
P3   36   -   2
P4  155   +   4
P5   22
---   ---
Total P1-P3 294   -  30
Total   471   -  26


Previous Report
===

https://gcc.gnu.org/ml/gcc/2019-02/msg00122.html

[PATCH], PR target/93932, Do not use input_operand for variable vector extract insns on PowerPC

2020-02-26 Thread Michael Meissner

As part of improving the vector extract built-ins for the future processor, and
also looking at optimizing all of the cases with vector extracts and sign,
zero, float extension for GCC 11 (PR target/93230), I noticed that the code
generated for some of the tests had regressed from the GCC 8 time frame.

What is happening is in some instances, we want to do a vector extract with a
variable element where the vector is in a register:

 #include 

long long
foo (vector long long v, unsigned long n)
{
  return vec_extract (v, n);
}

During the reload pass, the register allocator decides that it should spill the
insn to the stack, and then do the vector extract from memory (which is an
optimization to prevent loading the vector in case we only want one element).

I came to the conclusion that having one insn that did both the variable vector
extract on a register and from memory, allowed this bad code to be generated.

I split the 3 variable vector extract insn patterns that used input_operand to
handle either type, to have one insn that handled a register, and a second insn
that handles only memory.

Note, there is a 4th place that uses input_operand for variable vector extracts
that is not touched by this patch.  The insn is trying to combine a variable
vector extract with a zero extend operation.  That insn pattern never will
match due to the way the insn pattern is written.  I will be submitting patches
to fix this insn separately (PR target/93937), and that patch will split the
insn into two separate insns.

I have built this code on both a little endian system and a big endian PowerPC
system, and I ran make check for C, C++, Fortran, and LTO.  There were no
regressions after applying the patch for the test suite.  Can I check this into
the master branch?

Because it also fails on the GCC 9 branch, I will be submitting patches for
that.  Due to the changes between GCC 9 and the current master, the patch in
this mail can not be blindly applied, but I have a patch for GCC 9.  GCC 8
generates the correct code, so it does not have to be fixed.

[gcc]
2020-02-26  Michael Meissner  

PR target/93932
* config/rs6000/vsx.md (vsx_extract__var, VSX_D iterator):
Split the insn into two parts.  This insn only does variable
extract from a register.
(vsx_extract__var_load, VSX_D iterator): New insn, do
variable extract from memory.
(vsx_extract_v4sf_var): Split the insn into two parts.  This insn
only does variable extract from a register.
(vsx_extract_v4sf_var_load): New insn, do variable extract from
memory.
(vsx_extract__var, VSX_EXTRACT_I iterator): Split the insn
into two parts.  This insn only does variable extract from a
register.
(vsx_extract__var_load, VSX_EXTRACT_I iterator): New insn,
do variable extract from memory.

[gcc/testsuite]
2020-02-26  Michael Meissner  

PR target/93932
* gcc.target/powerpc/fold-vec-extract-longlong.p8.c: Adjust
instruction counts.

--- /tmp/TMHdwO_vsx.md  2020-02-26 13:25:27.250209645 -0500
+++ gcc/config/rs6000/vsx.md2020-02-26 13:25:21.357125563 -0500
@@ -3245,14 +3245,14 @@ (define_insn "vsx_vslo_"
   "vslo %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
-;; Variable V2DI/V2DF extract
+;; Variable V2DI/V2DF extract from a register
 (define_insn_and_split "vsx_extract__var"
-  [(set (match_operand: 0 "gpc_reg_operand" "=v,wa,r")
-   (unspec: [(match_operand:VSX_D 1 "input_operand" "v,Q,Q")
-(match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
+  [(set (match_operand: 0 "gpc_reg_operand" "=v")
+   (unspec: [(match_operand:VSX_D 1 "gpc_reg_operand" "v")
+(match_operand:DI 2 "gpc_reg_operand" "r")]
UNSPEC_VSX_EXTRACT))
-   (clobber (match_scratch:DI 3 "=r,&b,&b"))
-   (clobber (match_scratch:V2DI 4 "=&v,X,X"))]
+   (clobber (match_scratch:DI 3 "=r"))
+   (clobber (match_scratch:V2DI 4 "=&v"))]
   "VECTOR_MEM_VSX_P (mode) && TARGET_DIRECT_MOVE_64BIT"
   "#"
   "&& reload_completed"
@@ -3263,6 +3263,23 @@ (define_insn_and_split "vsx_extract__var_load"
+  [(set (match_operand: 0 "gpc_reg_operand" "=wa,r")
+   (unspec: [(match_operand:VSX_D 1 "memory_operand" "Q,Q")
+(match_operand:DI 2 "gpc_reg_operand" "r,r")]
+   UNSPEC_VSX_EXTRACT))
+   (clobber (match_scratch:DI 3 "=&b,&b"))]
+  "VECTOR_MEM_VSX_P (mode) && TARGET_DIRECT_MOVE_64BIT"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (match_dup 4))]
+{
+  operands[4] = rs6000_adjust_vec_address (operands[0], operands[1], 
operands[2],
+  operands[3], mode);
+}
+  [(set_attr "type" "fpload,load")])
+
 ;; Extract a SF element from V4SF
 (define_insn_and_split "vsx_extract_v4sf"
   [(set (match_operand:SF 0 "vsx_register_operand" "=wa")
@@ -3315,14 +3332,14 @@ (d

[PATCH] maintainer-scripts: Speed up git clone in gcc_release

2020-02-26 Thread Jakub Jelinek

Hi!

When doing the 8.4-rc1, I've noticed (probably also because of the dying
disk on sourceware) that git clone is extremely slow, and furthermore when
all of us have some local snapshots, it is a waste of resources to download
everything again.  Especially for the -f runs when we'll need to wait until
git tag -s asks us for a gpg password interactively.

The following patch adds an option through which one can point the script
at a local gcc .git directory from which it can --dissociate --reference ...
during cloning to speed it up.

Tested with the 8.4-rc1, ok for trunk and after a while release branches?

2020-02-26  Jakub Jelinek  

* gcc_release: Add support for -b local-git-repo argument.

--- maintainer-scripts/gcc_release.jj   2020-01-13 13:53:24.158187355 +0100
+++ maintainer-scripts/gcc_release  2020-02-26 22:43:34.286172860 +0100
@@ -9,7 +9,7 @@
 # Contents:
 #   Script to create a GCC release.
 #
-# Copyright (c) 2001-2018 Free Software Foundation.
+# Copyright (c) 2001-2020 Free Software Foundation.
 #
 # This file is part of GCC.
 #
@@ -78,6 +78,7 @@ Options:
   -p previous-tarball  Location of a previous tarball (to generate diff files).
   -t tag   Tag to mark the release in git.
   -u username  Username for upload operations.
+  -b local-git-repoLocal git repository to speed up cloning.
 EOF
 exit 1
 }
@@ -103,8 +104,14 @@ build_sources() {
   changedir "${WORKING_DIRECTORY}"
 
   # Check out the sources.
-  ${GIT} clone -q -b "${GITBRANCH}" "${GITROOT}" "`basename 
${SOURCE_DIRECTORY}`" || \
-  error "Could not check out release sources"
+  if [ -n "${GIT_REFERENCE}" ]; then
+${GIT} clone -q --dissociate --reference "${GIT_REFERENCE}" \
+-b "${GITBRANCH}" "${GITROOT}" "`basename 
${SOURCE_DIRECTORY}`" || \
+error "Could not check out release sources"
+  else
+${GIT} clone -q -b "${GITBRANCH}" "${GITROOT}" "`basename 
${SOURCE_DIRECTORY}`" || \
+error "Could not check out release sources"
+  fi
 
   # If this is a final release, make sure that the ChangeLogs
   # and version strings are updated.
@@ -567,6 +574,9 @@ TAG=""
 # The old tarballs from which to generate diffs.
 OLD_TARS=""
 
+# Local gcc git checkout to speed up git cloning.
+GIT_REFERENCE=""
+
 # The directory that will be used to construct the release.  The
 # release itself will be placed in a subdirectory of this directory.
 DESTINATION=${HOME}
@@ -613,7 +623,7 @@ TAR="${TAR:-tar}"
 
 
 # Parse the options.
-while getopts "d:fr:u:t:p:s:l" ARG; do
+while getopts "d:fr:u:t:p:s:lb:" ARG; do
 case $ARG in
 d)DESTINATION="${OPTARG}";;
 r)RELEASE="${OPTARG}";;
@@ -631,6 +641,7 @@ while getopts "d:fr:u:t:p:s:l" ARG; do
   if [ ! -f ${OPTARG} ]; then
error "-p argument must name a tarball"
  fi;;
+b)GIT_REFERENCE="${OPTARG}";;
 \?)   usage;;
 esac
 done

Jakub

Re: [PATCH 01/10] i386: Properly encode vector registers in vector move

2020-02-26 Thread Jeff Law

On Sat, 2020-02-15 at 07:26 -0800, H.J. Lu wrote:
> On x86, when AVX and AVX512 are enabled, vector move instructions can
> be encoded with either 2-byte/3-byte VEX (AVX) or 4-byte EVEX (AVX512):
> 
>0: c5 f9 6f d1 vmovdqa %xmm1,%xmm2
>4: 62 f1 fd 08 6f d1   vmovdqa64 %xmm1,%xmm2
> 
> We prefer VEX encoding over EVEX since VEX is shorter.  Also AVX512F
> only supports 512-bit vector moves.  AVX512F + AVX512VL supports 128-bit
> and 256-bit vector moves.  Mode attributes on x86 vector move patterns
> indicate target preferences of vector move encoding.  For vector register
> to vector register move, we can use 512-bit vector move instructions to
> move 128-bit/256-bit vector if AVX512VL isn't available.  With AVX512F
> and AVX512VL, we should use VEX encoding for 128-bit/256-bit vector moves
> if upper 16 vector registers aren't used.  This patch adds a function,
> ix86_output_ssemov, to generate vector moves:
> 
> 1. If zmm registers are used, use EVEX encoding.
> 2. If xmm16-xmm31/ymm16-ymm31 registers aren't used, SSE or VEX encoding
> will be generated.
> 3. If xmm16-xmm31/ymm16-ymm31 registers are used:
>a. With AVX512VL, AVX512VL vector moves will be generated.
>b. Without AVX512VL, xmm16-xmm31/ymm16-ymm31 register to register
>   move will be done with zmm register move.
> 
> 
[ ... ]

>  
> +/* Return the opcode of the TYPE_SSEMOV instruction.  To move from
> +   or to xmm16-xmm31/ymm16-ymm31 registers, we either require
> +   TARGET_AVX512VL or it is a register to register move which can
> +   be done with zmm register move. */
> +
> +static const char *
> +ix86_get_ssemov (rtx *operands, unsigned size,
> +  enum attr_mode insn_mode, machine_mode mode)
> +{
> +  char buf[128];
> +  bool misaligned_p = (misaligned_operand (operands[0], mode)
> +|| misaligned_operand (operands[1], mode));
> +  bool evex_reg_p = (EXT_REX_SSE_REG_P (operands[0])
> +  || EXT_REX_SSE_REG_P (operands[1]));
> +  machine_mode scalar_mode;
> +
> +  else if (SCALAR_INT_MODE_P (scalar_mode))
> +{
> +  switch (scalar_mode)
> + {
> + case E_QImode:
> +   if (size == 64)
> + opcode = (misaligned_p
> +   ? (TARGET_AVX512BW
> +  ? "vmovdqu8"
> +  : "vmovdqu64")
> +   : "vmovdqa64");
> +   else if (evex_reg_p)
> + {
> +   if (TARGET_AVX512VL)
> + opcode = (misaligned_p
> +   ? (TARGET_AVX512BW
> +  ? "vmovdqu8"
> +  : "vmovdqu64")
> +   : "vmovdqa64");
> + }
> +   else
> + opcode = (misaligned_p
> +   ? (TARGET_AVX512BW
> +  ? "vmovdqu8"
> +  : "%vmovdqu")
> +   : "%vmovdqa");
> +   break;
> + case E_HImode:
> +   if (size == 64)
> + opcode = (misaligned_p
> +   ? (TARGET_AVX512BW
> +  ? "vmovdqu16"
> +  : "vmovdqu64")
> +   : "vmovdqa64");
> +   else if (evex_reg_p)
> + {
> +   if (TARGET_AVX512VL)
> + opcode = (misaligned_p
> +   ? (TARGET_AVX512BW
> +  ? "vmovdqu16"
> +  : "vmovdqu64")
> +   : "vmovdqa64");
> + }
> +   else
> + opcode = (misaligned_p
> +   ? (TARGET_AVX512BW
> +  ? "vmovdqu16"
> +  : "%vmovdqu")
> +   : "%vmovdqa");
> +   break;
> + case E_SImode:
> +   if (size == 64)
> + opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
> +   else if (evex_reg_p)
> + {
> +   if (TARGET_AVX512VL)
> + opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
> + }
> +   else
> + opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
> +   break;
> + case E_DImode:
> + case E_TImode:
> + case E_OImode:
> +   if (size == 64)
> + opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
> +   else if (evex_reg_p)
> + {
> +   if (TARGET_AVX512VL)
> + opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
> + }
> +   else
> + opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
> +   break;
> + case E_XImode:
> +   opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
> +   break;
> + default:
> +   gcc_unreachable ();
> + }
> +}
> +  else
> +gcc_unreachable ();
> +
> +  if (!opcode)
> +{
> +  /* NB: We get here only because we move xmm16-xmm31/ymm16-ymm31
> +  registers without AVX512VL by using zmm register move.  */
So the overall flow control in here is rather convoluted.  I hate the way you
don't set OPCODE above and then do it down here.  I would suggest breaking 
the !opcode

[PATCH] handle attribute format on functions without prototype (PR 93812)

2020-02-26 Thread Martin Sebor


GCC accepts attribute format even on functions without a prototype.
However, when such a function is subsequently redeclared with
a prototype the attribute validation runs into trouble.  To avoid
the ICE I considered a) dropping the attribute on functions declared
without a prototype (like Clang does) and b) dropping the attribute
only when the function is subsequently redeclared with a prototype.
Since (b) is slightly safer in terms of retaining potentially useful
functionality the attached patch implements it and verifies that
the attribute is still accepted on functions without a prototype
and has the expected effect.

Tested on x86_64-linux.  I propose this for both GCC 10 and 9.

Martin

PS The solution causes a redunant warning for each instance of
a redeclaration with a prototype of a function without a prototype
but with attribute format.  Since this seems like a rare problem
I don't think it's worth going to the trouble of making sure only
one warning.

PPS I don't think it's intentional that GCC accepts attribute format
on functions without a prototype, first because it normally requires
that the format argument be a char*, and second because I could find
no tests for the "feature" in the test suite.  It would make sense
to me reject such declarations but I don't think now is a good time
for such a change.
PR c/93812 - ICE on redeclaration of an attribute format function without protoype

gcc/c/ChangeLog:

	PR c/93812
	* c-typeck.c (build_functype_attribute_variant): New function.
	(composite_type): Call it.

gcc/testsuite/ChangeLog:

	PR c/93812
	* gcc.dg/format/proto.c: New test.

diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 8df0849bd8b..308fcffcfb0 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -353,7 +353,28 @@ c_vla_type_p (const_tree t)
 return true;
   return false;
 }
-
+
+/* If NTYPE is a type of a non-variadic function with a prototype
+   and OTYPE is a type of a function without a prototype and ATTRS
+   contains attribute format, diagnosess and removes it from ATTRS.
+   Returns the result of build_type_attribute_variant of NTYPE and
+   the (possibly) modified ATTRS.  */
+
+static tree
+build_functype_attribute_variant (tree ntype, tree otype, tree attrs)
+{
+  if (!prototype_p (otype)
+  && prototype_p (ntype)
+  && lookup_attribute ("format", attrs))
+{
+  warning_at (input_location, OPT_Wattributes,
+		  "%qs attribute cannot be applied to a function that "
+		  "does not take variable arguments", "format");
+  attrs = remove_attribute ("format", attrs);
+}
+  return build_type_attribute_variant (ntype, attrs);
+
+}
 /* Return the composite type of two compatible types.
 
We assume that comptypes has already been done and returned
@@ -504,9 +525,9 @@ composite_type (tree t1, tree t2)
 
 	/* Save space: see if the result is identical to one of the args.  */
 	if (valtype == TREE_TYPE (t1) && !TYPE_ARG_TYPES (t2))
-	  return build_type_attribute_variant (t1, attributes);
+	  return build_functype_attribute_variant (t1, t2, attributes);
 	if (valtype == TREE_TYPE (t2) && !TYPE_ARG_TYPES (t1))
-	  return build_type_attribute_variant (t2, attributes);
+	  return build_functype_attribute_variant (t2, t1, attributes);
 
 	/* Simple way if one arg fails to specify argument types.  */
 	if (TYPE_ARG_TYPES (t1) == NULL_TREE)
diff --git a/gcc/testsuite/gcc.dg/format/proto.c b/gcc/testsuite/gcc.dg/format/proto.c
new file mode 100644
index 000..b2050c9de5b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/format/proto.c
@@ -0,0 +1,89 @@
+/* PR c/93812 - ICE on redeclaration of an attribute format function without
+   protoype
+   It's not clear that attribute format should be accepted on functions
+   without a prototype.  If it's decided that it shouldn't be the tests
+   here will need to be adjusted.
+   { dg-do compile }
+   { dg-options "-Wall" } */
+
+#define FMT(n1, n2) __attribute__((__format__(__printf__, n1, n2)))
+
+// Exercise function declarations.
+FMT (1, 2) void print1 ();
+
+FMT (2, 3) void print2 ();
+   void print2 ();
+
+FMT (3, 4) void print3 ();
+FMT (3, 4) void print3 ();
+
+FMT (1, 2) void print4 ();
+   void print4 (void);  // { dg-warning "'format' attribute cannot be applied to a function that does not take variable arguments" }
+
+   void print5 ();
+FMT (1, 2) void print5 (void);  // { dg-warning "\\\[-Wattributes" }
+
+FMT (1, 2) void print6 ();
+  void print6 (const char*, ...);   // { dg-error "conflicting types" }
+
+   void print7 (const char*, ...);
+FMT (1, 2) void print7 ();  // { dg-error "conflicting types" }
+
+
+// Exercise function calls.
+void test_print (void)
+{
+  print1 ("%i %s", 123, "");
+  print1 ("%s %i", 123, 123);   // { dg-warning "\\\[-Wformat" }
+
+  print2 (0, "%s %i", "", 123);
+  print2 (1, "%i %s", "", 123); // { dg-warning "\\\[-Wformat" }
+
+  print3 (0, 1, "%s %i", "", 123);
+

Re: [PATCH] c++: Fix ICE with invalid array bounds [PR93789]

2020-02-26 Thread Jason Merrill


On 2/26/20 3:44 PM, Marek Polacek wrote:

r7-2111 introduced maybe_constant_value in cp_fully_fold.
maybe_constant_value uses cxx_eval_outermost_constant_expr, which
can clear TREE_CONSTANT:
6510   else if (non_constant_p && TREE_CONSTANT (r))
[...]
6529   TREE_CONSTANT (r) = false;

In this test the array size is '(long int) "h"'.  This used to be
TREE_CONSTANT but given the change above, the flag will be cleared
when we cp_fully_fold the array size in compute_array_index_type_loc.


I wonder about giving an error at that point; if size is TREE_CONSTANT 
and folded is not, we're in a strange situation already.



That means we don't emit an error in the
10391   else if (TREE_CONSTANT (size)
block in the same function, and we go on.  Then we compute ITYPE
using cp_build_binary_op and use maybe_constant_value on it and
suddenly we have something that's TREE_CONSTANT again.


Why does maybe_constant_value consider (long)"h" - 1 to be a 
constant-expression?



And then we
crash in reshape_init_array_1 in tree_to_uhwi, because what we have
doesn't fit in an unsigned HWI.

icc accepts this code, but since we used to reject it, I see no desire
to make this work, so I've added a check that triggers when we end up
with a constant that is not an INTEGER_CST.

Bootstrapped/regtested on x86_64-linux, ok for trunk?  Probably no need
to change released versions.

2020-02-26  Marek Polacek  

PR c++/93789 - ICE with invalid array bounds.
* decl.c (compute_array_index_type_loc): Give an error
when ITYPE ends up being a constant that is not an INTEGER_CST.

* g++.dg/ext/vla22.C: New test.
---
  gcc/cp/decl.c|  7 +++
  gcc/testsuite/g++.dg/ext/vla22.C | 10 ++
  2 files changed, 17 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/ext/vla22.C

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 1947c4ddb7f..4ac29d99d04 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -10473,6 +10473,13 @@ compute_array_index_type_loc (location_t name_loc, 
tree name, tree size,
  error ("overflow in array dimension");
  TREE_OVERFLOW (itype) = 0;
}
+  else if (TREE_CODE (itype) != INTEGER_CST)
+   {
+ if (!(complain & tf_error))
+   return error_mark_node;
+ invalid_array_size_error (loc, cst_size_not_constant, itype, name);
+ itype = integer_one_node;
+   }
  }
  
/* Create and return the appropriate index type.  */

diff --git a/gcc/testsuite/g++.dg/ext/vla22.C b/gcc/testsuite/g++.dg/ext/vla22.C
new file mode 100644
index 000..22e200b49cf
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/vla22.C
@@ -0,0 +1,10 @@
+// PR c++/93789 - ICE with invalid array bounds.
+// { dg-do compile }
+// { dg-options "" }
+
+void
+f ()
+{
+  const int tbl[(long) "h"] = { 12 }; // { dg-error "size of array .tbl. is not a constant 
expression" "" { target c++11 } }
+// { dg-error "size of array .tbl. is not an integral constant-expression" "" 
{ target c++98_only } .-1 }
+}

base-commit: 051b9873e78fe1acb1a3fecd0c6e5685b6c12fb3

Re: [PATCH] use pointer size rather than array size when storing the former (PR 93829)

2020-02-26 Thread Martin Sebor


On 2/26/20 3:09 PM, Jeff Law wrote:

On Wed, 2020-02-19 at 17:26 -0700, Martin Sebor wrote:

The buffer overflow detection for multi-char stores uses the size
of a source array even when what's actually being accessed (read
and stored) is a pointer to the array.  That leads to incorrect
warnings in some cases.

The attached patch corrects the function that computes the size of
the access to set it to that of a pointer instead if the source is
an address expression.

Tested on x86_64-linux.



if (TREE_CODE (exp) == ADDR_EXPR)
-exp = TREE_OPERAND (exp, 0);
+{
+  /* If the size of the access hasn't been determined yet it's that
+of a pointer.  */
+  if (!nbytes)
+   nbytes = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (exp)));
+  exp = TREE_OPERAND (exp, 0);
+}
  

This doesn't make any sense to me.  You're always going to get the size of a
pointer here.  Don't you want the size of the TYPE of the operand?


The function correctly handles cases when the expression is an array
because the array is what's being stored.  The bug is in stripping
the ADDR_EXPR and then using the size of the operand when what's
being stored is actually a pointer.

This test case illustrates the difference:

  struct S { char *p, *q, r[8]; } a;

  void f (void)
  {
struct S b = { 0, "1234567", "abcdefgh" };
__builtin_memcpy (&a, &b, sizeof a);
  }

The memcpy results in:

  MEM  [(char * {ref-all})&b] = 0B;
  MEM  [(char * {ref-all})&b + 8B] = "1234567";
  MEM  [(char * {ref-all})&a] = MEM char[24]> [(char * {ref-all})&b];


The RHS of the second MEM_REF is ADDR_EXPR (STRING_CST).  The RHS
of the third MEM_REF is the  array case (that's handled correctly).

I think computing the size of the store could be simplified (it
seems it should be the same as type of either the RHS or the LHS)
but I didn't want to mess with things too much this late.

Martin

Re: [committed] libstdc++: Add __maybe_const_t and __maybe_empty_t aliases

2020-02-26 Thread Jonathan Wakely

On Wed, 26 Feb 2020 at 18:31, Daniel Krügler wrote:
>
> Am Mi., 26. Feb. 2020 um 16:20 Uhr schrieb Jonathan Wakely 
> :
> >
> > This introduces a couple of convenience alias templates to be used for
> > some repeated patterns using std::conditional_t.
>
> I find it a bit confusing/inconsistent to define __maybe_const_t such
> that the bool argument says "is const", while for __maybe_empty_t the
> corresponding bool argument says "is not empty". Suggestion: Implement
> __maybe_empty_t such that its bool argument says "is empty" or change
> the alias to __maybe_nonempty_t.

Good point, I'll make that change. Thanks.

Re: [PATCH 01/10] i386: Properly encode vector registers in vector move

2020-02-26 Thread H.J. Lu

On Wed, Feb 26, 2020 at 2:42 PM Jeff Law  wrote:
>
> On Sat, 2020-02-15 at 07:26 -0800, H.J. Lu wrote:
> > On x86, when AVX and AVX512 are enabled, vector move instructions can
> > be encoded with either 2-byte/3-byte VEX (AVX) or 4-byte EVEX (AVX512):
> >
> >0: c5 f9 6f d1 vmovdqa %xmm1,%xmm2
> >4: 62 f1 fd 08 6f d1   vmovdqa64 %xmm1,%xmm2
> >
> > We prefer VEX encoding over EVEX since VEX is shorter.  Also AVX512F
> > only supports 512-bit vector moves.  AVX512F + AVX512VL supports 128-bit
> > and 256-bit vector moves.  Mode attributes on x86 vector move patterns
> > indicate target preferences of vector move encoding.  For vector register
> > to vector register move, we can use 512-bit vector move instructions to
> > move 128-bit/256-bit vector if AVX512VL isn't available.  With AVX512F
> > and AVX512VL, we should use VEX encoding for 128-bit/256-bit vector moves
> > if upper 16 vector registers aren't used.  This patch adds a function,
> > ix86_output_ssemov, to generate vector moves:
> >
> > 1. If zmm registers are used, use EVEX encoding.
> > 2. If xmm16-xmm31/ymm16-ymm31 registers aren't used, SSE or VEX encoding
> > will be generated.
> > 3. If xmm16-xmm31/ymm16-ymm31 registers are used:
> >a. With AVX512VL, AVX512VL vector moves will be generated.
> >b. Without AVX512VL, xmm16-xmm31/ymm16-ymm31 register to register
> >   move will be done with zmm register move.
> >
> >
> [ ... ]
>
> >
> > +/* Return the opcode of the TYPE_SSEMOV instruction.  To move from
> > +   or to xmm16-xmm31/ymm16-ymm31 registers, we either require
> > +   TARGET_AVX512VL or it is a register to register move which can
> > +   be done with zmm register move. */
> > +
> > +static const char *
> > +ix86_get_ssemov (rtx *operands, unsigned size,
> > +  enum attr_mode insn_mode, machine_mode mode)
> > +{
> > +  char buf[128];
> > +  bool misaligned_p = (misaligned_operand (operands[0], mode)
> > +|| misaligned_operand (operands[1], mode));
> > +  bool evex_reg_p = (EXT_REX_SSE_REG_P (operands[0])
> > +  || EXT_REX_SSE_REG_P (operands[1]));
> > +  machine_mode scalar_mode;
> > +
> > +  else if (SCALAR_INT_MODE_P (scalar_mode))
> > +{
> > +  switch (scalar_mode)
> > + {
> > + case E_QImode:
> > +   if (size == 64)
> > + opcode = (misaligned_p
> > +   ? (TARGET_AVX512BW
> > +  ? "vmovdqu8"
> > +  : "vmovdqu64")
> > +   : "vmovdqa64");
> > +   else if (evex_reg_p)
> > + {
> > +   if (TARGET_AVX512VL)
> > + opcode = (misaligned_p
> > +   ? (TARGET_AVX512BW
> > +  ? "vmovdqu8"
> > +  : "vmovdqu64")
> > +   : "vmovdqa64");
> > + }
> > +   else
> > + opcode = (misaligned_p
> > +   ? (TARGET_AVX512BW
> > +  ? "vmovdqu8"
> > +  : "%vmovdqu")
> > +   : "%vmovdqa");
> > +   break;
> > + case E_HImode:
> > +   if (size == 64)
> > + opcode = (misaligned_p
> > +   ? (TARGET_AVX512BW
> > +  ? "vmovdqu16"
> > +  : "vmovdqu64")
> > +   : "vmovdqa64");
> > +   else if (evex_reg_p)
> > + {
> > +   if (TARGET_AVX512VL)
> > + opcode = (misaligned_p
> > +   ? (TARGET_AVX512BW
> > +  ? "vmovdqu16"
> > +  : "vmovdqu64")
> > +   : "vmovdqa64");
> > + }
> > +   else
> > + opcode = (misaligned_p
> > +   ? (TARGET_AVX512BW
> > +  ? "vmovdqu16"
> > +  : "%vmovdqu")
> > +   : "%vmovdqa");
> > +   break;
> > + case E_SImode:
> > +   if (size == 64)
> > + opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
> > +   else if (evex_reg_p)
> > + {
> > +   if (TARGET_AVX512VL)
> > + opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
> > + }
> > +   else
> > + opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
> > +   break;
> > + case E_DImode:
> > + case E_TImode:
> > + case E_OImode:
> > +   if (size == 64)
> > + opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
> > +   else if (evex_reg_p)
> > + {
> > +   if (TARGET_AVX512VL)
> > + opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
> > + }
> > +   else
> > + opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
> > +   break;
> > + case E_XImode:
> > +   opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
> > +   break;
> > + default:
> > +   gcc_unreachable ();
> > + }
> > +}
> > +  else
> > +gcc_unreachable ();
> > +
> > +  if (!opcode)
> > +{
>

Re: [PATCH 01/10] i386: Properly encode vector registers in vector move

2020-02-26 Thread Jeff Law

On Wed, 2020-02-26 at 16:02 -0800, H.J. Lu wrote:
> On Wed, Feb 26, 2020 at 2:42 PM Jeff Law  wrote:
> > On Sat, 2020-02-15 at 07:26 -0800, H.J. Lu wrote:
> > > On x86, when AVX and AVX512 are enabled, vector move instructions can
> > > be encoded with either 2-byte/3-byte VEX (AVX) or 4-byte EVEX (AVX512):
> > > 
> > >0: c5 f9 6f d1 vmovdqa %xmm1,%xmm2
> > >4: 62 f1 fd 08 6f d1   vmovdqa64 %xmm1,%xmm2
> > > 
> > > We prefer VEX encoding over EVEX since VEX is shorter.  Also AVX512F
> > > only supports 512-bit vector moves.  AVX512F + AVX512VL supports 128-bit
> > > and 256-bit vector moves.  Mode attributes on x86 vector move patterns
> > > indicate target preferences of vector move encoding.  For vector register
> > > to vector register move, we can use 512-bit vector move instructions to
> > > move 128-bit/256-bit vector if AVX512VL isn't available.  With AVX512F
> > > and AVX512VL, we should use VEX encoding for 128-bit/256-bit vector moves
> > > if upper 16 vector registers aren't used.  This patch adds a function,
> > > ix86_output_ssemov, to generate vector moves:
> > > 
> > > 1. If zmm registers are used, use EVEX encoding.
> > > 2. If xmm16-xmm31/ymm16-ymm31 registers aren't used, SSE or VEX encoding
> > > will be generated.
> > > 3. If xmm16-xmm31/ymm16-ymm31 registers are used:
> > >a. With AVX512VL, AVX512VL vector moves will be generated.
> > >b. Without AVX512VL, xmm16-xmm31/ymm16-ymm31 register to register
> > >   move will be done with zmm register move.
> > > 
> > > 
> > [ ... ]
> > 
> > > +/* Return the opcode of the TYPE_SSEMOV instruction.  To move from
> > > +   or to xmm16-xmm31/ymm16-ymm31 registers, we either require
> > > +   TARGET_AVX512VL or it is a register to register move which can
> > > +   be done with zmm register move. */
> > > +
> > > +static const char *
> > > +ix86_get_ssemov (rtx *operands, unsigned size,
> > > +  enum attr_mode insn_mode, machine_mode mode)
> > > +{
> > > +  char buf[128];
> > > +  bool misaligned_p = (misaligned_operand (operands[0], mode)
> > > +|| misaligned_operand (operands[1], mode));
> > > +  bool evex_reg_p = (EXT_REX_SSE_REG_P (operands[0])
> > > +  || EXT_REX_SSE_REG_P (operands[1]));
> > > +  machine_mode scalar_mode;
> > > +
> > > +  else if (SCALAR_INT_MODE_P (scalar_mode))
> > > +{
> > > +  switch (scalar_mode)
> > > + {
> > > + case E_QImode:
> > > +   if (size == 64)
> > > + opcode = (misaligned_p
> > > +   ? (TARGET_AVX512BW
> > > +  ? "vmovdqu8"
> > > +  : "vmovdqu64")
> > > +   : "vmovdqa64");
> > > +   else if (evex_reg_p)
> > > + {
> > > +   if (TARGET_AVX512VL)
> > > + opcode = (misaligned_p
> > > +   ? (TARGET_AVX512BW
> > > +  ? "vmovdqu8"
> > > +  : "vmovdqu64")
> > > +   : "vmovdqa64");
> > > + }
> > > +   else
> > > + opcode = (misaligned_p
> > > +   ? (TARGET_AVX512BW
> > > +  ? "vmovdqu8"
> > > +  : "%vmovdqu")
> > > +   : "%vmovdqa");
> > > +   break;
> > > + case E_HImode:
> > > +   if (size == 64)
> > > + opcode = (misaligned_p
> > > +   ? (TARGET_AVX512BW
> > > +  ? "vmovdqu16"
> > > +  : "vmovdqu64")
> > > +   : "vmovdqa64");
> > > +   else if (evex_reg_p)
> > > + {
> > > +   if (TARGET_AVX512VL)
> > > + opcode = (misaligned_p
> > > +   ? (TARGET_AVX512BW
> > > +  ? "vmovdqu16"
> > > +  : "vmovdqu64")
> > > +   : "vmovdqa64");
> > > + }
> > > +   else
> > > + opcode = (misaligned_p
> > > +   ? (TARGET_AVX512BW
> > > +  ? "vmovdqu16"
> > > +  : "%vmovdqu")
> > > +   : "%vmovdqa");
> > > +   break;
> > > + case E_SImode:
> > > +   if (size == 64)
> > > + opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
> > > +   else if (evex_reg_p)
> > > + {
> > > +   if (TARGET_AVX512VL)
> > > + opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
> > > + }
> > > +   else
> > > + opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
> > > +   break;
> > > + case E_DImode:
> > > + case E_TImode:
> > > + case E_OImode:
> > > +   if (size == 64)
> > > + opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
> > > +   else if (evex_reg_p)
> > > + {
> > > +   if (TARGET_AVX512VL)
> > > + opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
> > > + }
> > > +   else
> > > + opcode = misaligned_p ? "%vmovdqu" : "%vmo

Re: [PATCH] c++: Fix ICE with invalid array bounds [PR93789]

2020-02-26 Thread Martin Sebor


On 2/26/20 1:44 PM, Marek Polacek wrote:

r7-2111 introduced maybe_constant_value in cp_fully_fold.
maybe_constant_value uses cxx_eval_outermost_constant_expr, which
can clear TREE_CONSTANT:
6510   else if (non_constant_p && TREE_CONSTANT (r))
[...]
6529   TREE_CONSTANT (r) = false;

In this test the array size is '(long int) "h"'.  This used to be
TREE_CONSTANT but given the change above, the flag will be cleared
when we cp_fully_fold the array size in compute_array_index_type_loc.
That means we don't emit an error in the
10391   else if (TREE_CONSTANT (size)
block in the same function, and we go on.  Then we compute ITYPE
using cp_build_binary_op and use maybe_constant_value on it and
suddenly we have something that's TREE_CONSTANT again.  And then we
crash in reshape_init_array_1 in tree_to_uhwi, because what we have
doesn't fit in an unsigned HWI.

icc accepts this code, but since we used to reject it, I see no desire
to make this work, so I've added a check that triggers when we end up
with a constant that is not an INTEGER_CST.


G++ accepts this equivalent:

  void f () {
long n = ( long ) "h";
const int tbl [n] = { 12 };
...
  }

and it doesn't look like the patch changes that.

I think all these cases of initialized VLAs ought to be handled
the same way: either they should all be rejected or they should
all be accepted.  The rather dated pr58646, also a regression,
is another example of a few ICEs in this area.  See also this
recently submitted patch:
  https://gcc.gnu.org/ml/gcc-patches/2020-02/msg01148.html

FWIW, at one point I got VLA initialization to work again (r234966),
after it had been removed due to the last minute removal of VLAs
from the standard, but it caused trouble for Java and I was forced
to revert the changes.  I've been hoping to revisit it but other
things have been getting in the way.

Martin




Bootstrapped/regtested on x86_64-linux, ok for trunk?  Probably no need
to change released versions.

2020-02-26  Marek Polacek  

PR c++/93789 - ICE with invalid array bounds.
* decl.c (compute_array_index_type_loc): Give an error
when ITYPE ends up being a constant that is not an INTEGER_CST.

* g++.dg/ext/vla22.C: New test.
---
  gcc/cp/decl.c|  7 +++
  gcc/testsuite/g++.dg/ext/vla22.C | 10 ++
  2 files changed, 17 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/ext/vla22.C

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 1947c4ddb7f..4ac29d99d04 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -10473,6 +10473,13 @@ compute_array_index_type_loc (location_t name_loc, 
tree name, tree size,
  error ("overflow in array dimension");
  TREE_OVERFLOW (itype) = 0;
}
+  else if (TREE_CODE (itype) != INTEGER_CST)
+   {
+ if (!(complain & tf_error))
+   return error_mark_node;
+ invalid_array_size_error (loc, cst_size_not_constant, itype, name);
+ itype = integer_one_node;
+   }
  }
  
/* Create and return the appropriate index type.  */

diff --git a/gcc/testsuite/g++.dg/ext/vla22.C b/gcc/testsuite/g++.dg/ext/vla22.C
new file mode 100644
index 000..22e200b49cf
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/vla22.C
@@ -0,0 +1,10 @@
+// PR c++/93789 - ICE with invalid array bounds.
+// { dg-do compile }
+// { dg-options "" }
+
+void
+f ()
+{
+  const int tbl[(long) "h"] = { 12 }; // { dg-error "size of array .tbl. is not a constant 
expression" "" { target c++11 } }
+// { dg-error "size of array .tbl. is not an integral constant-expression" "" 
{ target c++98_only } .-1 }
+}

base-commit: 051b9873e78fe1acb1a3fecd0c6e5685b6c12fb3

Re: [PATCH] maintainer-scripts: Speed up git clone in gcc_release

2020-02-26 Thread Joseph Myers

On Wed, 26 Feb 2020, Jakub Jelinek wrote:

> Hi!
> 
> When doing the 8.4-rc1, I've noticed (probably also because of the dying
> disk on sourceware) that git clone is extremely slow, and furthermore when
> all of us have some local snapshots, it is a waste of resources to download
> everything again.  Especially for the -f runs when we'll need to wait until
> git tag -s asks us for a gpg password interactively.
> 
> The following patch adds an option through which one can point the script
> at a local gcc .git directory from which it can --dissociate --reference ...
> during cloning to speed it up.
> 
> Tested with the 8.4-rc1, ok for trunk and after a while release branches?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com

[committed] analyzer: fix ICE on unreachable calls [PR 93947]

2020-02-26 Thread David Malcolm

PR analyzer/93947 reports an ICE at -O1 when attempting to analyze a
call that has been optimized away as unreachable.

The root cause is a NULL dereference due to the fndecl having a NULL
cgraph_node: the cgraph_node was created by
pass_build_cgraph_edges::execute, but was later removed by
symbol_table::remove_unreachable_nodes before the analyzer pass.

This patch fixes it by checking for NULL before handling the
cgraph_node.

The reproducer demonstrates a weakness in the analyzer's constraint
handling, where region_model::apply_constraints_for_gswitch fails
to spot when the cases fully cover the data type, and thus make the
default impossible.  For now this is xfail-ed in the testcase.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as r10-6879-g0ba70d1b5ae8df6406a880b2d23e4710b393e8c9.

gcc/analyzer/ChangeLog:
PR analyzer/93947
* region-model.cc (region_model::get_fndecl_for_call): Gracefully
fail for fn_decls that don't have a cgraph_node.

gcc/testsuite/ChangeLog:
PR analyzer/93947
* gcc.dg/analyzer/torture/pr93947.c: New test.
---
 gcc/analyzer/region-model.cc  |  6 ++-
 .../gcc.dg/analyzer/torture/pr93947.c | 40 +++
 2 files changed, 44 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/torture/pr93947.c

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index a71884d7b11..b2179bd220a 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -6732,8 +6732,10 @@ region_model::get_fndecl_for_call (const gcall *call,
  tree fn_decl = code->get_tree_for_child_region (fn_rid);
  if (!fn_decl)
return NULL_TREE;
- const cgraph_node *ultimate_node
-   = cgraph_node::get (fn_decl)->ultimate_alias_target ();
+ cgraph_node *node = cgraph_node::get (fn_decl);
+ if (!node)
+   return NULL_TREE;
+ const cgraph_node *ultimate_node = node->ultimate_alias_target ();
  if (ultimate_node)
return ultimate_node->decl;
}
diff --git a/gcc/testsuite/gcc.dg/analyzer/torture/pr93947.c 
b/gcc/testsuite/gcc.dg/analyzer/torture/pr93947.c
new file mode 100644
index 000..73982ef4bd3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/torture/pr93947.c
@@ -0,0 +1,40 @@
+/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } { "" } } */
+
+#include "../analyzer-decls.h"
+
+struct pf {
+  unsigned int iu : 2;
+};
+
+enum {
+  qr, jv, vm, mz,
+};
+
+int uh;
+
+void
+w9 (struct pf *x2)
+{
+  /* We ought to know the following based on the bitfield width.  */
+  __analyzer_eval (x2->iu >= 0 ); /* { dg-warning "TRUE" } */
+  __analyzer_eval (x2->iu < 4 ); /* { dg-warning "TRUE" } */
+
+  switch (x2->iu)
+{
+case qr:
+case jv:
+case vm:
+  uh = 0;
+  break;
+
+case mz:
+  break;
+
+default:
+  /* We ought to know from the enum values that this code is unreachable,
+and thus not print anything.
+TODO(xfail): currently this doesn't work.  */
+  __analyzer_eval (x2->iu); /* { dg-bogus "" "" { xfail *-*-* } } */
+  __builtin_abort ();
+}
+}
-- 
2.21.0

[committed] analyzer: fix ICE with -Wanalyzer-null-dereference [PR 93950]

2020-02-26 Thread David Malcolm

PR analyzer/93950 reports an ICE when pruning the path of a
-Wanalyzer-null-dereference diagnostic.

The root cause is a bug in the state-tracking code, in which the
variable of interest is tracked from the callee to a "nullptr" param
at the caller, whereupon we have an INTEGER_CST "variable", and
the attempt to look up its lvalue fails.

This code could use a rewrite; in the meantime this patch extends
the bulletproofing from g:8525d1f5f57b11fe04a97674cc2fc2b7727621d0
for PR analyzer/93544 to all of the various places where var can
be updated, fixing the ICE.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as r10-6880-g71b633aaea3aac2d983da7b1b99da8c9a8c80d1a.

gcc/analyzer/ChangeLog:
PR analyzer/93950
* diagnostic-manager.cc
(diagnostic_manager::prune_for_sm_diagnostic): Assert that var is
either NULL or not a constant.  When updating var, bulletproof
against constant values.

gcc/testsuite/ChangeLog:
PR analyzer/93950
* g++.dg/analyzer/pr93950.C: New test.
---
 gcc/analyzer/diagnostic-manager.cc  | 16 ++
 gcc/testsuite/g++.dg/analyzer/pr93950.C | 28 +
 2 files changed, 44 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/analyzer/pr93950.C

diff --git a/gcc/analyzer/diagnostic-manager.cc 
b/gcc/analyzer/diagnostic-manager.cc
index 78c5890054f..b8e59334374 100644
--- a/gcc/analyzer/diagnostic-manager.cc
+++ b/gcc/analyzer/diagnostic-manager.cc
@@ -1105,6 +1105,7 @@ diagnostic_manager::prune_for_sm_diagnostic (checker_path 
*path,
  else
log ("considering event %i", idx);
}
+  gcc_assert (var == NULL || !CONSTANT_CLASS_P (var));
   switch (base_event->m_kind)
{
default:
@@ -1164,6 +1165,11 @@ diagnostic_manager::prune_for_sm_diagnostic 
(checker_path *path,
log ("event %i: switching var of interest from %qE to %qE",
 idx, var, state_change->m_origin);
var = state_change->m_origin;
+   if (var && CONSTANT_CLASS_P (var))
+ {
+   log ("new var is a constant; setting var to NULL");
+   var = NULL_TREE;
+ }
  }
log ("event %i: switching state of interest from %qs to %qs",
 idx, sm->get_state_name (state_change->m_to),
@@ -1260,6 +1266,11 @@ diagnostic_manager::prune_for_sm_diagnostic 
(checker_path *path,
var = caller_var;
if (expr.param_p ())
  event->record_critical_state (var, state);
+   if (var && CONSTANT_CLASS_P (var))
+ {
+   log ("new var is a constant; setting var to NULL");
+   var = NULL_TREE;
+ }
  }
  }
  break;
@@ -1285,6 +1296,11 @@ diagnostic_manager::prune_for_sm_diagnostic 
(checker_path *path,
var = callee_var;
if (expr.return_value_p ())
  event->record_critical_state (var, state);
+   if (var && CONSTANT_CLASS_P (var))
+ {
+   log ("new var is a constant; setting var to NULL");
+   var = NULL_TREE;
+ }
  }
  }
  }
diff --git a/gcc/testsuite/g++.dg/analyzer/pr93950.C 
b/gcc/testsuite/g++.dg/analyzer/pr93950.C
new file mode 100644
index 000..e2808173407
--- /dev/null
+++ b/gcc/testsuite/g++.dg/analyzer/pr93950.C
@@ -0,0 +1,28 @@
+// { dg-do compile { target c++11 } }
+
+struct d
+{
+  struct e
+  {
+int f;
+int *g;
+  };
+  void h (e * i)
+  {
+void *j = nullptr; // { dg-bogus "NULL" "" { xfail *-*-* } }
+// TODO(xfail): we report "'i' is NULL" above, which is the wrong location
+
+i->f = *i->g; // { dg-warning "dereference of NULL 'i'" }
+  }
+  virtual void c (int, int)
+  {
+int *j = nullptr;
+h (nullptr);
+  }
+};
+
+void
+foo ()
+{
+  d ();
+}
-- 
2.21.0

Re: [PATCH] coroutines: Amend parameter handling to match n4849.

2020-02-26 Thread JunMa


在 2020/2/26 下午9:43, Iain Sandoe 写道:

This is the second in the series to bring the GCC implementation into line
with the current standard.

@JunMa
   I believe that this should solve the problems you were looking at in
  “[PATCH Coroutines] Fix issue with unused corutine function parameters”
  and supercedes that patch (since we needed to alter the ordering as well).

   If any parameter issues remain, please file a PR (or a patch, of course).

OK for trunk?
thanks
Iain


=

In n4849 and preceding versions, [class.copy.elision] (1.3)
appears to confer additional permissions on coroutines to elide
parameter copies.

After considerable discussion on this topic by email and during
the February 2020 WG21 meeting, it has been determined that there
are no additional permissions applicable to coroutine parameter
copy elision.

The content of that clause in the standard is expected to be amended
eventually to clarify this.  Other than this, the handling of
parameter lifetimes is expected to be as per n4849:

  * A copy is made before the promise is constructed
  * If the promise CTOR uses the parms, then it should use the copy
where appropriate.
  * The param copy lifetimes end after the promise is destroyed
(during the coroutine frame destruction).
  * Otherwise, C++20 copy elision rules apply.

(as an aside) In practice, we expect that copy elision can only occur
when the coroutine body is fully inlined, possibly in conjunction with
heap allocation elision.

The patch:
  * Reorders the copying process to precede the promise CTOR and
 ensures the correct use.
  * Copies all params into the frame regardless of whether the coro
body uses them (this is a bit unfortunate, and we should figure
out an amendment for C++23).

Hi iain，
    This makes sense to me.

Regards
JunMa

gcc/cp/ChangeLog:

2020-02-26  Iain Sandoe  

* coroutines.cc (struct param_info): Keep track of params
that are references, and cache the original type and whether
the DTOR is trivial.
(build_actor_fn): Handle param copies always, and adjust the
handling for references.
(register_param_uses): Only handle uses here.
(classtype_has_non_deleted_copy_ctor): New.
(morph_fn_to_coro): Adjust param copy handling to match n4849
by reordering ahead of the promise CTOR and always making a
frame copy, even if the param is unused in the coroutine body.

gcc/testsuite/ChangeLog:

2020-02-26  Iain Sandoe  

* g++.dg/coroutines/coro1-refs-and-ctors.h: New.
* g++.dg/coroutines/torture/func-params-07.C: New test.
* g++.dg/coroutines/torture/func-params-08.C: New test.
---
  gcc/cp/coroutines.cc  | 313 +++---
  .../g++.dg/coroutines/coro1-refs-and-ctors.h  | 144 
  .../coroutines/torture/func-params-07.C   |  81 +
  .../coroutines/torture/func-params-08.C   | 112 +++
  4 files changed, 524 insertions(+), 126 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/coroutines/coro1-refs-and-ctors.h
  create mode 100644 gcc/testsuite/g++.dg/coroutines/torture/func-params-07.C
  create mode 100644 gcc/testsuite/g++.dg/coroutines/torture/func-params-08.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 524d4872804..e0e7e66fe5e 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -1742,6 +1742,11 @@ struct param_info
tree field_id;
vec *body_uses;
tree frame_type;
+  tree orig_type;
+  bool by_ref;
+  bool rv_ref;
+  bool pt_ref;
+  bool trivial_dtor;
  };
  
  struct local_var_info

@@ -1941,26 +1946,37 @@ build_actor_fn (location_t loc, tree coro_frame_type, 
tree actor, tree fnbody,
  
/* Re-write param references in the body, no code should be generated

   here.  */
-  if (DECL_ARGUMENTS (orig) && param_uses != NULL)
+  if (DECL_ARGUMENTS (orig))
  {
tree arg;
for (arg = DECL_ARGUMENTS (orig); arg != NULL; arg = DECL_CHAIN (arg))
{
  bool existed;
  param_info &parm = param_uses->get_or_insert (arg, &existed);
- if (parm.field_id == NULL_TREE)
-   continue; /* Wasn't used.  */
+ if (!parm.body_uses)
+   continue; /* Wasn't used in the orignal function body.  */
+
  tree fld_ref = lookup_member (coro_frame_type, parm.field_id,
/*protect=*/1, /*want_type=*/0,
tf_warning_or_error);
- tree fld_idx = build3_loc (loc, COMPONENT_REF, TREE_TYPE (arg),
+ tree fld_idx = build3_loc (loc, COMPONENT_REF, parm.frame_type,
 actor_frame, fld_ref, NULL_TREE);
+
+ /* We keep these in the frame as a regular pointer, so convert that
+  back to the type expected.  */
+ if (parm.pt_ref)
+   fld_idx = build1_loc (loc, CONVERT_EXPR, TREE_TYPE (arg), fld_idx);
+
+ /* We expect an rvalue ref. here.  */
+

Re: [PATCH] c++: Fix ICE with invalid array bounds [PR93789]

2020-02-26 Thread Marek Polacek

On Wed, Feb 26, 2020 at 06:01:30PM -0700, Martin Sebor wrote:
> On 2/26/20 1:44 PM, Marek Polacek wrote:
> > r7-2111 introduced maybe_constant_value in cp_fully_fold.
> > maybe_constant_value uses cxx_eval_outermost_constant_expr, which
> > can clear TREE_CONSTANT:
> > 6510   else if (non_constant_p && TREE_CONSTANT (r))
> > [...]
> > 6529   TREE_CONSTANT (r) = false;
> > 
> > In this test the array size is '(long int) "h"'.  This used to be
> > TREE_CONSTANT but given the change above, the flag will be cleared
> > when we cp_fully_fold the array size in compute_array_index_type_loc.
> > That means we don't emit an error in the
> > 10391   else if (TREE_CONSTANT (size)
> > block in the same function, and we go on.  Then we compute ITYPE
> > using cp_build_binary_op and use maybe_constant_value on it and
> > suddenly we have something that's TREE_CONSTANT again.  And then we
> > crash in reshape_init_array_1 in tree_to_uhwi, because what we have
> > doesn't fit in an unsigned HWI.
> > 
> > icc accepts this code, but since we used to reject it, I see no desire
> > to make this work, so I've added a check that triggers when we end up
> > with a constant that is not an INTEGER_CST.
> 
> G++ accepts this equivalent:
> 
>   void f () {
> long n = ( long ) "h";
> const int tbl [n] = { 12 };
> ...
>   }
> 
> and it doesn't look like the patch changes that.

It does not, but g++ will reject the same code if you make 'n' const:
a) if 'n' is not const, we won't be able to fold 'n' and it's not
   TREE_CONSTANT -> we treat it as a VLA
b) if 'n' is const, we still can't fold it to an INTEGER_CST but it is
   TREE_CONSTANT -> we give an error
c) if 'n' is constexpr, it's ill-formed since we're converting a pointer
   to integral type (more on this in my next mail)

I would reject this even in a), because we're likely allocating something
huge that's likely to overflow the stack, making the code very questionable.

> I think all these cases of initialized VLAs ought to be handled
> the same way: either they should all be rejected or they should
> all be accepted.  The rather dated pr58646, also a regression,
> is another example of a few ICEs in this area.  See also this
> recently submitted patch:
>   https://gcc.gnu.org/ml/gcc-patches/2020-02/msg01148.html

Yeah, it's a gift that keeps on giving :/.

> FWIW, at one point I got VLA initialization to work again (r234966),
> after it had been removed due to the last minute removal of VLAs
> from the standard, but it caused trouble for Java and I was forced
> to revert the changes.  I've been hoping to revisit it but other
> things have been getting in the way.

I think clang doesn't allow VLA initialization at all, so there's that.
I haven't reviewed our old correspondence on this topic but I think I
was, and still am, for restricting what we allow users to do with VLAs.

--
Marek Polacek • Red Hat, Inc. • 300 A St, Boston, MA

Re: [PING PATCH coroutines v1] Build co_await/yield_expr with unknown_type in processing_template_decl phase

2020-02-26 Thread JunMa


在 2020/2/10 下午7:42, JunMa 写道:
Ping~

Regards
JunMa

Kindly ping.

Regards
JunMa

在 2020/2/5 下午5:17, JunMa 写道:

在 2020/2/5 下午2:14, JunMa 写道:

Hi
This patch builds co_await/yield_expr with unknown_type when we can not
know the promise type in processing_template_decl phase. it avoid to
confuse compiler when handing type deduction and conversion.

Bootstrap and test on X86_64, is it OK?

Regards
JunMa


Hi
sorry for that '}' was removed, here is the update patch:)

Regards
JunMa

gcc/cp
2020-02-05  Jun Ma 

    * coroutines.cc (finish_co_await_expr): Build co_await_expr
    with unknown_type_node.
    (finish_co_yield_expr): Ditto.
    *pt.c (type_dependent_expression_p): Set co_await/yield_expr
    with unknown type as dependent.

gcc/testsuite
2020-02-05  Jun Ma 

    * g++.dg/coroutines/torture/co-await-14-template-traits.C: 
New test.

Re: [PING PATCH coroutines] Set side effects flag for BIND_EXPR which build in maybe_promote_captured_temps

2020-02-26 Thread JunMa


在 2020/2/11 上午10:50, JunMa 写道:
Hi
kindly ping~

Regards
JunMa

Hi
As title. in maybe_promote_captured_temps, we promote captured 
temporaries

and co_await_expr into a new BIND_EXPR. As the BIND_EXPR contains
co_await_expr and maybe other function calls, the side effects flag 
should

be set.

This patch fix one mismatch in cppcoro, the testcase comes from cppcoro
and is reduced by creduce.

Bootstrap and test on X86_64, is it OK?

Regards
JunMa

gcc/cp
2020-02-11  Jun Ma 

    * coroutines.cc (maybe_promote_captured_temps): Set side effects
    flag for BIND_EXPR.

gcc/testsuite
2020-02-11  Jun Ma 

    * g++.dg/coroutines/torture/lambda-10-co-await-lambda.C: New 
test.

Re: [PING PATCH coroutines] Do not strip cleanup_point when promote temporaries out of current stmt

2020-02-26 Thread JunMa


在 2020/2/11 上午10:14, JunMa 写道:
Kindly ping

Regards
JunMa

Hi
In maybe_promote_captured_temps, the cleanup_point_stmt has been
stripped when handle temporaries captured by reference. However, maybe
there are non-reference temporaries in current stmt which cause ice in
gimpilify pass.

This patch fix this. The testcase comes from cppcoro and is reduced by
creduce.

Bootstrap and test on X86_64, is it OK?

Regards
JunMa

gcc/cp
2020-02-11  Jun Ma 

    * coroutines.cc (maybe_promote_captured_temps): Do not strip
    cleanup_point_stmt.

gcc/testsuite
2020-02-11  Jun Ma 

    * g++.dg/coroutines/torture/lambda-09-capture-object.C: New test.

Re: [PING PATCH coroutines] Handle component_ref in captures_temporary

2020-02-26 Thread JunMa


在 2020/2/12 下午3:23, JunMa 写道:
Kindly ping

Regards
JunMa

Hi
In captures_temporary, the current implementation fails to handle
component_ref. This causes ice with case co_await A while
operator co_await is defined in base class of A. Also it is necessary
to capture the object of base class as if it is temporary object.

This patch strips component_ref to its base object and check it as usual.

Bootstrap and test on X86_64, is it OK?

Regards
JunMa

gcc/cp
2020-02-12  Jun Ma 

    * coroutines.cc (captures_temporary): Strip component_ref
    to its base object.

gcc/testsuite
2020-02-12  Jun Ma 

    * g++.dg/coroutines/torture/co-await-15-capture-comp-ref.C: 
New test.

Re: [PATCH v2] c++: Fix ICE with invalid array bounds [PR93789]

2020-02-26 Thread Marek Polacek

On Wed, Feb 26, 2020 at 05:54:03PM -0500, Jason Merrill wrote:
> On 2/26/20 3:44 PM, Marek Polacek wrote:
> > r7-2111 introduced maybe_constant_value in cp_fully_fold.
> > maybe_constant_value uses cxx_eval_outermost_constant_expr, which
> > can clear TREE_CONSTANT:
> > 6510   else if (non_constant_p && TREE_CONSTANT (r))
> > [...]
> > 6529   TREE_CONSTANT (r) = false;
> > 
> > In this test the array size is '(long int) "h"'.  This used to be
> > TREE_CONSTANT but given the change above, the flag will be cleared
> > when we cp_fully_fold the array size in compute_array_index_type_loc.
> 
> I wonder about giving an error at that point; if size is TREE_CONSTANT and
> folded is not, we're in a strange situation already.

That works as well; how about the attached patch?

> > That means we don't emit an error in the
> > 10391   else if (TREE_CONSTANT (size)
> > block in the same function, and we go on.  Then we compute ITYPE
> > using cp_build_binary_op and use maybe_constant_value on it and
> > suddenly we have something that's TREE_CONSTANT again.
> 
> Why does maybe_constant_value consider (long)"h" - 1 to be a
> constant-expression?

That's because this check in cxx_eval_outermost_constant_expr:
  if (CONVERT_EXPR_CODE_P (TREE_CODE (r)) 
  && ARITHMETIC_TYPE_P (TREE_TYPE (r)) 
  && TREE_CODE (TREE_OPERAND (r, 0)) == ADDR_EXPR)
{
  if (!allow_non_constant)
error ("conversion from pointer type %qT "
   "to arithmetic type %qT in a constant expression",
   TREE_TYPE (TREE_OPERAND (r, 0)), TREE_TYPE (r));
  non_constant_p = true;
}
doesn't work for all subexpressions, as the comment says.  As a consequence,
this test

constexpr long int
foo ()
{
  return (long int) "foo"
#ifdef FOO
- 1
#endif
;
}

constexpr long int l = foo ();

is accepted with -DFOO but rejected otherwise.  Converting a pointer to an
integral type must be done via a reinterpret_cast and that can't be part of
a core constant expression.  Do you want a PR for this?

-- >8 --
r7-2111 introduced maybe_constant_value in cp_fully_fold.
maybe_constant_value uses cxx_eval_outermost_constant_expr, which
can clear TREE_CONSTANT:
6510   else if (non_constant_p && TREE_CONSTANT (r))
[...]
6529   TREE_CONSTANT (r) = false;

In this test the array size is '(long int) "h"'.  This used to be
TREE_CONSTANT but given the change above, the flag will be cleared
when we cp_fully_fold the array size in compute_array_index_type_loc.
That means we don't emit an error in the
10391   else if (TREE_CONSTANT (size)
block in the same function, and we go on.  Then we compute ITYPE
using cp_build_binary_op and use maybe_constant_value on it and
suddenly we have something that's TREE_CONSTANT again.  And then we
crash in reshape_init_array_1 in tree_to_uhwi, because what we have
doesn't fit in an unsigned HWI.

icc accepts this code, but since we used to reject it, I see no desire
to make this work, so don't use the folded result when we've lost
the TREE_CONSTANT flag while evaluating the size.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2020-02-26  Marek Polacek  

PR c++/93789 - ICE with invalid array bounds.
* decl.c (compute_array_index_type_loc): Don't use the folded
size when folding cleared TREE_CONSTANT.

* g++.dg/ext/vla22.C: New test.
---
 gcc/cp/decl.c| 11 ---
 gcc/testsuite/g++.dg/ext/vla22.C |  9 +
 2 files changed, 17 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/vla22.C

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 1947c4ddb7f..e3f4b435a49 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -10338,9 +10338,14 @@ compute_array_index_type_loc (location_t name_loc, 
tree name, tree size,
pedwarn (loc, OPT_Wpedantic,
 "size of array is not an integral constant-expression");
}
-  /* Use the folded result for VLAs, too; it will have resolved
-SIZEOF_EXPR.  */
-  size = folded;
+  if (TREE_CONSTANT (size) && !TREE_CONSTANT (folded))
+   /* We might have lost the TREE_CONSTANT flag e.g. when we are
+  folding a conversion from a pointer to integral type.  In that
+  case issue an error below and don't treat this as a VLA.  */;
+  else
+   /* Use the folded result for VLAs, too; it will have resolved
+  SIZEOF_EXPR.  */
+   size = folded;
 }

   /* Normally, the array-bound will be a constant.  */
diff --git a/gcc/testsuite/g++.dg/ext/vla22.C b/gcc/testsuite/g++.dg/ext/vla22.C
new file mode 100644
index 000..2308ee748df
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/vla22.C
@@ -0,0 +1,9 @@
+// PR c++/93789 - ICE with invalid array bounds.
+// { dg-do compile }
+// { dg-options "" }
+
+void
+f ()
+{
+  const int tbl[(long) "h"] = { 12 }; // { dg-error "size of array .tbl. is 
not an integral constant-expression" }
+}

base-commit: 89f759ac2ebb9f09ce5655ce5d791793922c612d
--

Re: [PATCH v2] c++: Fix ICE with invalid array bounds [PR93789]

2020-02-26 Thread Jason Merrill


On 2/26/20 9:31 PM, Marek Polacek wrote:

On Wed, Feb 26, 2020 at 05:54:03PM -0500, Jason Merrill wrote:

On 2/26/20 3:44 PM, Marek Polacek wrote:

r7-2111 introduced maybe_constant_value in cp_fully_fold.
maybe_constant_value uses cxx_eval_outermost_constant_expr, which
can clear TREE_CONSTANT:
6510   else if (non_constant_p && TREE_CONSTANT (r))
[...]
6529   TREE_CONSTANT (r) = false;

In this test the array size is '(long int) "h"'.  This used to be
TREE_CONSTANT but given the change above, the flag will be cleared
when we cp_fully_fold the array size in compute_array_index_type_loc.


I wonder about giving an error at that point; if size is TREE_CONSTANT and
folded is not, we're in a strange situation already.


That works as well; how about the attached patch?


OK.


That means we don't emit an error in the
10391   else if (TREE_CONSTANT (size)
block in the same function, and we go on.  Then we compute ITYPE
using cp_build_binary_op and use maybe_constant_value on it and
suddenly we have something that's TREE_CONSTANT again.


Why does maybe_constant_value consider (long)"h" - 1 to be a
constant-expression?


That's because this check in cxx_eval_outermost_constant_expr:
   if (CONVERT_EXPR_CODE_P (TREE_CODE (r))
   && ARITHMETIC_TYPE_P (TREE_TYPE (r))
   && TREE_CODE (TREE_OPERAND (r, 0)) == ADDR_EXPR)
 {
   if (!allow_non_constant)
 error ("conversion from pointer type %qT "
"to arithmetic type %qT in a constant expression",
TREE_TYPE (TREE_OPERAND (r, 0)), TREE_TYPE (r));
   non_constant_p = true;
 }
doesn't work for all subexpressions, as the comment says.  As a consequence,
this test

constexpr long int
foo ()
{
   return (long int) "foo"
#ifdef FOO
 - 1
#endif
 ;
}

constexpr long int l = foo ();

is accepted with -DFOO but rejected otherwise.  Converting a pointer to an
integral type must be done via a reinterpret_cast and that can't be part of
a core constant expression.  Do you want a PR for this?


Please.


-- >8 --
r7-2111 introduced maybe_constant_value in cp_fully_fold.
maybe_constant_value uses cxx_eval_outermost_constant_expr, which
can clear TREE_CONSTANT:
6510   else if (non_constant_p && TREE_CONSTANT (r))
[...]
6529   TREE_CONSTANT (r) = false;

In this test the array size is '(long int) "h"'.  This used to be
TREE_CONSTANT but given the change above, the flag will be cleared
when we cp_fully_fold the array size in compute_array_index_type_loc.
That means we don't emit an error in the
10391   else if (TREE_CONSTANT (size)
block in the same function, and we go on.  Then we compute ITYPE
using cp_build_binary_op and use maybe_constant_value on it and
suddenly we have something that's TREE_CONSTANT again.  And then we
crash in reshape_init_array_1 in tree_to_uhwi, because what we have
doesn't fit in an unsigned HWI.

icc accepts this code, but since we used to reject it, I see no desire
to make this work, so don't use the folded result when we've lost
the TREE_CONSTANT flag while evaluating the size.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2020-02-26  Marek Polacek  

PR c++/93789 - ICE with invalid array bounds.
* decl.c (compute_array_index_type_loc): Don't use the folded
size when folding cleared TREE_CONSTANT.

* g++.dg/ext/vla22.C: New test.
---
  gcc/cp/decl.c| 11 ---
  gcc/testsuite/g++.dg/ext/vla22.C |  9 +
  2 files changed, 17 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/ext/vla22.C

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 1947c4ddb7f..e3f4b435a49 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -10338,9 +10338,14 @@ compute_array_index_type_loc (location_t name_loc, 
tree name, tree size,
pedwarn (loc, OPT_Wpedantic,
 "size of array is not an integral constant-expression");
}
-  /* Use the folded result for VLAs, too; it will have resolved
-SIZEOF_EXPR.  */
-  size = folded;
+  if (TREE_CONSTANT (size) && !TREE_CONSTANT (folded))
+   /* We might have lost the TREE_CONSTANT flag e.g. when we are
+  folding a conversion from a pointer to integral type.  In that
+  case issue an error below and don't treat this as a VLA.  */;
+  else
+   /* Use the folded result for VLAs, too; it will have resolved
+  SIZEOF_EXPR.  */
+   size = folded;
  }
  
/* Normally, the array-bound will be a constant.  */

diff --git a/gcc/testsuite/g++.dg/ext/vla22.C b/gcc/testsuite/g++.dg/ext/vla22.C
new file mode 100644
index 000..2308ee748df
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/vla22.C
@@ -0,0 +1,9 @@
+// PR c++/93789 - ICE with invalid array bounds.
+// { dg-do compile }
+// { dg-options "" }
+
+void
+f ()
+{
+  const int tbl[(long) "h"] = { 12 }; // { dg-error "size of array .tbl. is not an 
integral constant-expression" }
+}

base-commit: 89f759ac2ebb9f09ce5655ce5d791793922c6

Re: [PATCH v2] c++: Fix ICE with invalid array bounds [PR93789]

2020-02-26 Thread Marek Polacek

On Wed, Feb 26, 2020 at 09:41:14PM -0500, Jason Merrill wrote:
> On 2/26/20 9:31 PM, Marek Polacek wrote:
> > On Wed, Feb 26, 2020 at 05:54:03PM -0500, Jason Merrill wrote:
> > > On 2/26/20 3:44 PM, Marek Polacek wrote:
> > > > r7-2111 introduced maybe_constant_value in cp_fully_fold.
> > > > maybe_constant_value uses cxx_eval_outermost_constant_expr, which
> > > > can clear TREE_CONSTANT:
> > > > 6510   else if (non_constant_p && TREE_CONSTANT (r))
> > > > [...]
> > > > 6529   TREE_CONSTANT (r) = false;
> > > > 
> > > > In this test the array size is '(long int) "h"'.  This used to be
> > > > TREE_CONSTANT but given the change above, the flag will be cleared
> > > > when we cp_fully_fold the array size in compute_array_index_type_loc.
> > > 
> > > I wonder about giving an error at that point; if size is TREE_CONSTANT and
> > > folded is not, we're in a strange situation already.
> > 
> > That works as well; how about the attached patch?
> 
> OK.

Thanks.

> > > > That means we don't emit an error in the
> > > > 10391   else if (TREE_CONSTANT (size)
> > > > block in the same function, and we go on.  Then we compute ITYPE
> > > > using cp_build_binary_op and use maybe_constant_value on it and
> > > > suddenly we have something that's TREE_CONSTANT again.
> > > 
> > > Why does maybe_constant_value consider (long)"h" - 1 to be a
> > > constant-expression?
> > 
> > That's because this check in cxx_eval_outermost_constant_expr:
> >if (CONVERT_EXPR_CODE_P (TREE_CODE (r))
> >&& ARITHMETIC_TYPE_P (TREE_TYPE (r))
> >&& TREE_CODE (TREE_OPERAND (r, 0)) == ADDR_EXPR)
> >  {
> >if (!allow_non_constant)
> >  error ("conversion from pointer type %qT "
> > "to arithmetic type %qT in a constant expression",
> > TREE_TYPE (TREE_OPERAND (r, 0)), TREE_TYPE (r));
> >non_constant_p = true;
> >  }
> > doesn't work for all subexpressions, as the comment says.  As a consequence,
> > this test
> > 
> > constexpr long int
> > foo ()
> > {
> >return (long int) "foo"
> > #ifdef FOO
> >  - 1
> > #endif
> >  ;
> > }
> > 
> > constexpr long int l = foo ();
> > 
> > is accepted with -DFOO but rejected otherwise.  Converting a pointer to an
> > integral type must be done via a reinterpret_cast and that can't be part of
> > a core constant expression.  Do you want a PR for this?
> 
> Please.

https://gcc.gnu.org/PR93955

--
Marek Polacek • Red Hat, Inc. • 300 A St, Boston, MA

[committed, docs] Document negative form of warning options enabled by default [PR90467]

2020-02-26 Thread Sandra Loosemore

I've checked in this patch to address PR90467, which complained "many 
warning options that are enabled by default are documented in the 
-Woption form, not -Wno-option".  While I was in there I also found


- the option summary lists, which are supposed to be (mostly) 
alphabetized, had gotten jumbled up again


- some options were classified differently in the option summary lists 
than the section where the option documentation appears, or were missing 
from one place or another or listed in more than one summary


- some of the C++-specific warning options were listed in the C++ option 
section and some were listed with the other warning options


- a few missing index entries, markup issues, and other minor 
copy-editing problems


...and I tried to fix those things up as well.

There's more I'd like to do here:

- Currently the entries for each warning option are not consistent about 
identifying what languages the option is useful with.  E.g. many of the 
C++-specific options say "(C++ and Objective-C++ only)", but not all of 
them do.  And very few of the other warning options say whether they 
apply to all C-family languages or just C (and Objective-C?) or are not 
language-specific at all.  This needs another pass through the .opt 
files where these options are defined.


- As I mentioned when I originally reviewed the Static Analyzer 
documentation, I think it would probably be better to introduce some 
substructure in the Warnings section to have subsections for C++ 
warnings, Static Analyzer warnings, etc instead of putting the warning 
options in other sections with non-warning options.  At least the 
current placement of the docs for those options is an intermediate step 
in collecting them together by topic.


- The order of the documentation for the options within each section is 
pretty much random and probably needs some cleanup.


Anyway, the current patch is an incremental improvement and addresses 
the PR, and it seems better to check it in before I get interrupted by 
other tasks.


-Sandra
commit cf70bb0fbd7fb0f4bca99c53a36d0a389dcc2fc5
Author: Sandra Loosemore 
Date:   Wed Feb 26 19:51:06 2020 -0800

Document negative form of warning options enabled by default [PR90467].

2020-02-26  Sandra Loosemore  

	PR c++/90467

	gcc/
	* doc/invoke.texi (Option Summary): Re-alphabetize warnings in
	C++ Language Options, Warning Options, and Static Analyzer
	Options lists.  Document negative form of options enabled by
	default.  Move some things around to more accurately sort
	warnings by category.
	(C++ Dialect Options, Warning Options, Static Analyzer
	Options): Document negative form of options when enabled by
	default.  Move some things around to more accurately sort
	warnings by category.  Add some missing index entries.
	Light copy-editing.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 48f7d5d..d489f14 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,18 @@
+2020-02-26  Sandra Loosemore  
+
+	PR c++/90467
+
+	* doc/invoke.texi (Option Summary): Re-alphabetize warnings in
+	C++ Language Options, Warning Options, and Static Analyzer
+	Options lists.  Document negative form of options enabled by
+	default.  Move some things around to more accurately sort
+	warnings by category.
+	(C++ Dialect Options, Warning Options, Static Analyzer
+	Options): Document negative form of options when enabled by
+	default.  Move some things around to more accurately sort
+	warnings by category.  Add some missing index entries.
+	Light copy-editing.
+
 2020-02-26  Carl Love  
 
 	PR target/91276
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 208500c..e70ece6 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -231,20 +231,26 @@ in the following sections.
 -fvisibility-inlines-hidden @gol
 -fvisibility-ms-compat @gol
 -fext-numeric-literals @gol
--Wabi=@var{n}  -Wabi-tag  -Wcomma-subscript  -Wconversion-null @gol
--Wctor-dtor-privacy @gol
+-Wabi-tag  -Wcatch-value  -Wcatch-value=@var{n} @gol
+-Wno-class-conversion  -Wclass-memaccess @gol
+-Wcomma-subscript  -Wconditionally-supported @gol
+-Wno-conversion-null  -Wctor-dtor-privacy  -Wno-delete-incomplete @gol
 -Wdelete-non-virtual-dtor  -Wdeprecated-copy  -Wdeprecated-copy-dtor @gol
--Wliteral-suffix -Wmismatched-tags @gol
--Wmultiple-inheritance  -Wno-init-list-lifetime @gol
--Wnamespaces  -Wnarrowing @gol
--Wpessimizing-move  -Wredundant-move -Wredundant-tags @gol
--Wnoexcept  -Wnoexcept-type  -Wclass-memaccess @gol
--Wnon-virtual-dtor  -Wreorder  -Wregister @gol
--Weffc++  -Wstrict-null-sentinel  -Wtemplates @gol
+-Weffc++  -Wextra-semi  -Wno-inaccessible-base @gol
+-Wno-inherited-variadic-ctor  -Wno-init-list-lifetime @gol
+-Wno-invalid-offsetof  -Wno-literal-suffix  -Wmismatched-tags @gol
+-Wmultiple-inheritance  -Wnamespaces  -Wnarrowing @gol
+-Wnoexcept  -Wnoexcept-type  -Wnon-virtual-dtor @gol
+-Wpessimizing-move  -Wno-placement-new  -Wplacement-n

Re: [mid-end] Add notes to dataflow insn info when re-emitting (PR92410)

2020-02-26 Thread Roman Zhuykov

Hi all!

Does anybody considered backporting this?
Given https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92410#c3
"Confirmed also with r247015 (GCC 7 branch point)."

For example, we may apply it without an assertion in release branches.


Roman

12.11.2019 12:11, Matthew Malcomson wrote:
> In scheduling passes, notes are removed with `remove_notes` before the
> scheduling is done, and added back in with `reemit_notes` once the
> scheduling has been decided.
>
> This process leaves the notes in the RTL chain with different insn uid's
> than were there before.  Having different UID's (larger than the
> previous ones) means that DF_INSN_INFO_GET(insn) will access outside of
> the allocated array.
>
> This has been seen in the `regstat_bb_compute_calls_crossed` function.
> This patch adds an assert to the `regstat_bb_compute_calls_crossed`
> function so that bad accesses here are caught instead of going
> unnoticed, and then avoids the problem.
>
> We avoid the problem by ensuring that new notes added by `reemit_notes` have 
> an
> insn record given to them.  This is done by adding a call to
> `df_insn_create_insn_record` on each note added in `reemit_notes`.
> `df_insn_create_insn_record` leaves this new record zeroed out, which appears
> to be fine for notes (e.g. `df_bb_refs_record` already does not set
> anything except the luid for notes, and notes have no dataflow information to
> record).
>
> We add the testcase that Martin found here
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92410#c2 .
> This testcase fails with the "regstat.c" change, and then succeeds with the
> "haifa-sched.c" change.
>
> There is a similar problem with labels, that the `gcc_assert` catches
> when running regression tests in gcc.dg/fold-eqandshift-1.c and
> gcc.c-torture/compile/pr32482.c.
> This is due to the `cfg_layout_finalize` call in `bb-reorder.c` emitting
> new labels for the start of the newly created basic blocks. These labels are
> not given a dataflow df_insn_info member.
>
> We solve this by manually calling `df_recompute_luids` on each basic
> block once this pass has finished.
>
> Testing done:
>   Bootstrapped and regression test on aarch64-none-linux-gnu native.
>
> gcc/ChangeLog:
>
> 2019-11-12  Matthew Malcomson  
>
>   PR middle-end/92410
>   * bb-reorder.c (pass_reorder_blocks::execute): Recompute
>   dataflow luids once basic blocks have been reordered.
>   * haifa-sched.c (reemit_notes): Create df insn record for each
>   new note.
>   * regstat.c (regstat_bb_compute_calls_crossed): Assert every
>   insn has an insn record before trying to use it.
>
> gcc/testsuite/ChangeLog:
>
> 2019-11-12  Matthew Malcomson  
>
>   PR middle-end/92410
>   * gcc.dg/torture/pr92410.c: New test.
>
>
>
> ### Attachment also inlined for ease of reply
> ###
>
>
> diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
> index 
> 0ac39140c6ce3db8499f99cd8f48321de61b..4943f97a2cfccc7ab7c4834fff2c81f62fc5b738
>  100644
> --- a/gcc/bb-reorder.c
> +++ b/gcc/bb-reorder.c
> @@ -2662,6 +2662,8 @@ pass_reorder_blocks::execute (function *fun)
>bb->aux = bb->next_bb;
>cfg_layout_finalize ();
>  
> +  FOR_EACH_BB_FN (bb, fun)
> +df_recompute_luids (bb);
>return 0;
>  }
>  
> diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
> index 
> 41cf1f362e8c34d009b0a310ff5b9a9ffb613631..2e1a84f1325d54ece1cf3c6f0bab607099c71d27
>  100644
> --- a/gcc/haifa-sched.c
> +++ b/gcc/haifa-sched.c
> @@ -5433,6 +5433,7 @@ reemit_notes (rtx_insn *insn)
>  
> last = emit_note_before (note_type, last);
> remove_note (insn, note);
> +   df_insn_create_insn_record (last);
>   }
>  }
>  }
> diff --git a/gcc/regstat.c b/gcc/regstat.c
> index 
> 4da9b7cc523cde113df931226ea56310b659e619..c6cefb117d7330cc1b9787c37083e05e2acda2d2
>  100644
> --- a/gcc/regstat.c
> +++ b/gcc/regstat.c
> @@ -324,6 +324,7 @@ regstat_bb_compute_calls_crossed (unsigned int bb_index, 
> bitmap live)
>  
>FOR_BB_INSNS_REVERSE (bb, insn)
>  {
> +  gcc_assert (INSN_UID (insn) < DF_INSN_SIZE ());
>struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
>unsigned int regno;
>  
> diff --git a/gcc/testsuite/gcc.dg/torture/pr92410.c 
> b/gcc/testsuite/gcc.dg/torture/pr92410.c
> new file mode 100644
> index 
> ..628e329782260ef36f26506bfbc0d36a93b33f41
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr92410.c
> @@ -0,0 +1,8 @@
> +/* PR middle-end/92410  */
> +/* { dg-do compile } */
> +int v;
> +
> +int a() {
> +  ;
> +  return v;
> +}
>

79 matches

Mail list logo