PING^1 [PATCH v2] combine: Tweak the condition of last_set invalidation

2021-06-28 Thread Kewen.Lin via Gcc-patches
Hi!

I'd like to gentle ping this:

https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572555.html


BR,
Kewen

on 2021/6/11 下午9:16, Kewen.Lin via Gcc-patches wrote:
> Hi Segher,
> 
> Thanks for the review!
> 
> on 2021/6/10 上午4:17, Segher Boessenkool wrote:
>> Hi!
>>
>> On Wed, Dec 16, 2020 at 04:49:49PM +0800, Kewen.Lin wrote:
>>> Currently we have the check:
>>>
>>>   if (!insn
>>>   || (value && rsp->last_set_table_tick >= label_tick_ebb_start))
>>> rsp->last_set_invalid = 1; 
>>>
>>> which means if we want to record some value for some reg and
>>> this reg got refered before in a valid scope,
>>
>> If we already know it is *set* in this same extended basic block.
>> Possibly by the same instruction btw.
>>
>>> we invalidate the
>>> set of reg (last_set_invalid to 1).  It avoids to find the wrong
>>> set for one reg reference, such as the case like:
>>>
>>>... op regX  // this regX could find wrong last_set below
>>>regX = ...   // if we think this set is valid
>>>... op regX
>>
>> Yup, exactly.
>>
>>> But because of retry's existence, the last_set_table_tick could
>>> be set by some later reference insns, but we see it's set due
>>> to retry on the set (for that reg) insn again, such as:
>>>
>>>insn 1
>>>insn 2
>>>
>>>regX = ... --> (a)
>>>... op regX--> (b)
>>>
>>>insn 3
>>>
>>>// assume all in the same BB.
>>>
>>> Assuming we combine 1, 2 -> 3 sucessfully and replace them as two
>>> (3 insns -> 2 insns),
>>
>> This will delete insn 1 and write the combined result to insns 2 and 3.
>>
>>> retrying from insn1 or insn2 again:
>>
>> Always 2, but your point remains valid.
>>
>>> it will scan insn (a) again, the below condition holds for regX:
>>>
>>>   (value && rsp->last_set_table_tick >= label_tick_ebb_start)
>>>
>>> it will mark this set as invalid set.  But actually the
>>> last_set_table_tick here is set by insn (b) before retrying, so it
>>> should be safe to be taken as valid set.
>>
>> Yup.
>>
>>> This proposal is to check whether the last_set_table safely happens
>>> after the current set, make the set still valid if so.
>>
>>> Full SPEC2017 building shows this patch gets more sucessful combines
>>> from 1902208 to 1902243 (trivial though).
>>
>> Do you have some example, or maybe even a testcase?  :-)
>>
> 
> Sorry for the late reply, it took some time to get one reduced case.
> 
> typedef struct SA *pa_t;
> 
> struct SC {
>   int h;
>   pa_t elem[];
> };
> 
> struct SD {
>   struct SC *e;
> };
> 
> struct SA {
>   struct {
> struct SD f[1];
>   } g;
> };
> 
> void foo(pa_t *k, char **m) {
>   int l, i;
>   pa_t a;
>   l = (int)a->g.f[5].e;
>   i = 0;
>   for (; i < l; i++) {
> k[i] = a->g.f[5].e->elem[i];
> m[i] = "";
>   }
> }
> 
> Baseline is r12-0 and the option is "-O3 -mcpu=power9 -fno-strict-aliasing",
> with this patch, the generated assembly can save two rlwinm s.
> 
>>> +  /* Record the luid of the insn whose expression involving register n.  */
>>> +
>>> +  int  last_set_table_luid;
>>
>> "Record the luid of the insn for which last_set_table_tick was set",
>> right?
>>
> 
> But it can be updated later to one smaller luid, how about the wording like:
> 
> 
> +  /* Record the luid of the insn which uses register n, the insn should
> + be the first one using register n in that block of the insn which
> + last_set_table_tick was set for.  */
> 
> 
>>> -static void update_table_tick (rtx);
>>> +static void update_table_tick (rtx, int);
>>
>> Please remove this declaration instead, the function is not used until
>> after its actual definition :-)
>>
> 
> Done.
> 
>>> @@ -13243,7 +13247,21 @@ update_table_tick (rtx x)
>>>for (r = regno; r < endregno; r++)
>>> {
>>>   reg_stat_type *rsp = ®_stat[r];
>>> - rsp->last_set_table_tick = label_tick;
>>> + if (rsp->last_set_table_tick >= label_tick_ebb_start)
>>> +   {
>>> + /* Later references should not have lower ticks.  */
>>> + gcc_assert (label_tick >= rsp->last_set_table_tick);
>>
>> This should be obvious, but checking it won't hurt, okay.
>>
>>> + /* Should pick up the lowest luid if the references
>>> +are in the same block.  */
>>> + if (label_tick == rsp->last_set_table_tick
>>> + && rsp->last_set_table_luid > insn_luid)
>>> +   rsp->last_set_table_luid = insn_luid;
>>
>> Why?  Is it conservative for the check you will do later?  Please spell
>> this out, it is crucial!
>>
> 
> Since later the combinations involving this insn probably make the
> register be used in one insn sitting ahead (which has smaller luid than
> the one which was recorded before).  Yes, it's very conservative, this
> ensure that we always use the luid of the insn which is the first insn
> using this register in the block.  The last_set invalidation is going
> to catch the case like:
> 
>... regX  // avoid the set used here ...
>regX = ...
>...
> 
> Once 

Re: [PATCH 5/6] make get_domminated_by_region return a auto_vec

2021-06-28 Thread Trevor Saunders
On Wed, Jun 23, 2021 at 05:43:32PM -0600, Martin Sebor wrote:
> On 6/22/21 11:23 PM, Trevor Saunders wrote:
> > On Tue, Jun 22, 2021 at 02:01:24PM -0600, Martin Sebor wrote:
> > > On 6/21/21 1:15 AM, Richard Biener wrote:
> > > > On Fri, Jun 18, 2021 at 6:03 PM Martin Sebor  wrote:
> > > > > 
> > > > > On 6/18/21 4:38 AM, Richard Biener wrote:
> > > > > > On Thu, Jun 17, 2021 at 4:43 PM Martin Sebor  
> > > > > > wrote:
> > > > > > > 
> > > > > > > On 6/17/21 12:03 AM, Richard Biener wrote:
> > > > > > > > On Wed, Jun 16, 2021 at 6:01 PM Martin Sebor  
> > > > > > > > wrote:
> > > > > > > > > 
> > > > > > > > > On 6/16/21 6:46 AM, Richard Sandiford via Gcc-patches wrote:
> > > > > > > > > > Richard Biener via Gcc-patches  
> > > > > > > > > > writes:
> > > > > > > > > > > On Tue, Jun 15, 2021 at 8:02 AM Trevor Saunders 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > > 
> > > > > > > > > > > > This makes it clear the caller owns the vector, and 
> > > > > > > > > > > > ensures it is cleaned up.
> > > > > > > > > > > > 
> > > > > > > > > > > > Signed-off-by: Trevor Saunders 
> > > > > > > > > > > > 
> > > > > > > > > > > > bootstrapped and regtested on x86_64-linux-gnu, ok?
> > > > > > > > > > > 
> > > > > > > > > > > OK.
> > > > > > > > > > > 
> > > > > > > > > > > Btw, are "standard API" returns places we can use 'auto'? 
> > > > > > > > > > >  That would avoid
> > > > > > > > > > > excessive indent for
> > > > > > > > > > > 
> > > > > > > > > > > -  dom_bbs = get_dominated_by_region (CDI_DOMINATORS,
> > > > > > > > > > > -bbs.address (),
> > > > > > > > > > > -bbs.length ());
> > > > > > > > > > > +  auto_vec dom_bbs = 
> > > > > > > > > > > get_dominated_by_region (CDI_DOMINATORS,
> > > > > > > > > > > + 
> > > > > > > > > > >  bbs.address (),
> > > > > > > > > > > + 
> > > > > > > > > > >  bbs.length ());
> > > > > > > > > > > 
> > > > > > > > > > > and just uses
> > > > > > > > > > > 
> > > > > > > > > > >auto dom_bbs = get_dominated_by_region (...
> > > > > > > > > > > 
> > > > > > > > > > > Not asking you to do this, just a question for the 
> > > > > > > > > > > audience.
> > > > > > > > > > 
> > > > > > > > > > Personally I think this would be surprising for something 
> > > > > > > > > > that doesn't
> > > > > > > > > > have copy semantics.  (Not that I'm trying to reopen that 
> > > > > > > > > > debate here :-)
> > > > > > > > > > FWIW, I agree not having copy semantics is probably the 
> > > > > > > > > > most practical
> > > > > > > > > > way forward for now.)
> > > > > > > > > 
> > > > > > > > > But you did open the door for me to reiterate my strong 
> > > > > > > > > disagreement
> > > > > > > > > with that.  The best C++ practice going back to the early 
> > > > > > > > > 1990's is
> > > > > > > > > to make types safely copyable and assignable.  It is the 
> > > > > > > > > default for
> > > > > > > > > all types, in both C++ and C, and so natural and expected.
> > > > > > > > > 
> > > > > > > > > Preventing copying is appropriate in special and rare 
> > > > > > > > > circumstances
> > > > > > > > > (e.g, a mutex may not be copyable, or a file or iostream 
> > > > > > > > > object may
> > > > > > > > > not be because they represent a unique physical resource.)
> > > > > > > > > 
> > > > > > > > > In the absence of such special circumstances preventing 
> > > > > > > > > copying is
> > > > > > > > > unexpected, and in the case of an essential building block 
> > > > > > > > > such as
> > > > > > > > > a container, makes the type difficult to use.
> > > > > > > > > 
> > > > > > > > > The only argument for disabling copying that has been given is
> > > > > > > > > that it could be surprising(*).  But because all types are 
> > > > > > > > > copyable
> > > > > > > > > by default the "surprise" is usually when one can't be.
> > > > > > > > > 
> > > > > > > > > I think Richi's "surprising" has to do with the fact that it 
> > > > > > > > > lets
> > > > > > > > > one inadvertently copy a large amount of data, thus leading to
> > > > > > > > > an inefficiency.  But by analogy, there are infinitely many 
> > > > > > > > > ways
> > > > > > > > > to end up with inefficient code (e.g., deep recursion, or heap
> > > > > > > > > allocation in a loop), and they are not a reason to ban the 
> > > > > > > > > coding
> > > > > > > > > constructs that might lead to it.
> > > > > > > > > 
> > > > > > > > > IIUC, Jason's comment about surprising effects was about 
> > > > > > > > > implicit
> > > > > > > > > conversion from auto_vec to vec.  I share that concern, and 
> > > > > > > > > agree
> > > > > > > > > that it should be addressed by preventing the conversion (as 
> > > > > > > > > Jason
> > > > > > > > > suggested).
> > > > > > > > 
> > > > > > > > But fact is that how vec<> and auto_vec<> are used today in G

PING^2 [PATCH v2] rs6000: Add load density heuristic

2021-06-28 Thread Kewen.Lin via Gcc-patches
Hi,

Gentle ping this:

https://gcc.gnu.org/pipermail/gcc-patches/2021-May/571258.html

BR,
Kewen

on 2021/6/9 上午10:26, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 
> Gentle ping this:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/571258.html
> 
> BR,
> Kewen
> 
> on 2021/5/26 上午10:59, Kewen.Lin via Gcc-patches wrote:
>> Hi,
>>
>> This is the updated version of patch to deal with the bwaves_r
>> degradation due to vector construction fed by strided loads.
>>
>> As Richi's comments [1], this follows the similar idea to over
>> price the vector construction fed by VMAT_ELEMENTWISE or
>> VMAT_STRIDED_SLP.  Instead of adding the extra cost on vector
>> construction costing immediately, it firstly records how many
>> loads and vectorized statements in the given loop, later in
>> rs6000_density_test (called by finish_cost) it computes the
>> load density ratio against all vectorized stmts, and check
>> with the corresponding thresholds DENSITY_LOAD_NUM_THRESHOLD
>> and DENSITY_LOAD_PCT_THRESHOLD, do the actual extra pricing
>> if both thresholds are exceeded.
>>
>> Note that this new load density heuristic check is based on
>> some fields in target cost which are updated as needed when
>> scanning each add_stmt_cost entry, it's independent of the
>> current function rs6000_density_test which requires to scan
>> non_vect stmts.  Since it's checking the load stmts count
>> vs. all vectorized stmts, it's kind of density, so I put
>> it in function rs6000_density_test.  With the same reason to
>> keep it independent, I didn't put it as an else arm of the
>> current existing density threshold check hunk or before this
>> hunk.
>>
>> In the investigation of -1.04% degradation from 526.blender_r
>> on Power8, I noticed that the extra penalized cost 320 on one
>> single vector construction with type V16QI is much exaggerated,
>> which makes the final body cost unreliable, so this patch adds
>> one maximum bound for the extra penalized cost for each vector
>> construction statement.
>>
>> Bootstrapped/regtested on powerpc64le-linux-gnu P9.
>>
>> Full SPEC2017 performance evaluation on Power8/Power9 with
>> option combinations:
>>   * -O2 -ftree-vectorize {,-fvect-cost-model=very-cheap} {,-ffast-math}
>>   * {-O3, -Ofast} {,-funroll-loops}
>>
>> bwaves_r degradations on P8/P9 have been fixed, nothing else
>> remarkable was observed.
>>
>> Is it ok for trunk?
>>
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570076.html
>>
>> BR,
>> Kewen
>> -
>> gcc/ChangeLog:
>>
>>  * config/rs6000/rs6000.c (struct rs6000_cost_data): New members
>>  nstmts, nloads and extra_ctor_cost.
>>  (rs6000_density_test): Add load density related heuristics and the
>>  checks, do extra costing on vector construction statements if need.
>>  (rs6000_init_cost): Init new members.
>>  (rs6000_update_target_cost_per_stmt): New function.
>>  (rs6000_add_stmt_cost): Factor vect_nonmem hunk out to function
>>  rs6000_update_target_cost_per_stmt and call it.
>>
> 


Re: [RFC/PATCH v3] ira: Support more matching constraint forms with param [PR100328]

2021-06-28 Thread Hongtao Liu via Gcc-patches
On Mon, Jun 28, 2021 at 2:50 PM Kewen.Lin  wrote:
>
> Hi!
>
> on 2021/6/9 下午1:18, Kewen.Lin via Gcc-patches wrote:
> > Hi,
> >
> > PR100328 has some details about this issue, I am trying to
> > brief it here.  In the hottest function LBM_performStreamCollideTRT
> > of SPEC2017 bmk 519.lbm_r, there are many FMA style expressions
> > (27 FMA, 19 FMS, 11 FNMA).  On rs6000, this kind of FMA style
> > insn has two flavors: FLOAT_REG and VSX_REG, the VSX_REG reg
> > class have 64 registers whose foregoing 32 ones make up the
> > whole FLOAT_REG.  There are some differences for these two
> > flavors, taking "*fma4_fpr" as example:
> >
> > (define_insn "*fma4_fpr"
> >   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=,wa,wa")
> >   (fma:SFDF
> > (match_operand:SFDF 1 "gpc_reg_operand" "%,wa,wa")
> > (match_operand:SFDF 2 "gpc_reg_operand" ",wa,0")
> > (match_operand:SFDF 3 "gpc_reg_operand" ",0,wa")))]
> >
> > // wa => A VSX register (VSR), vs0…vs63, aka. VSX_REG.
> > //  (f/d) => A floating point register, aka. FLOAT_REG.
> >
> > So for VSX_REG, we only have the destructive form, when VSX_REG
> > alternative being used, the operand 2 or operand 3 is required
> > to be the same as operand 0.  reload has to take care of this
> > constraint and create some non-free register copies if required.
> >
> > Assuming one fma insn looks like:
> >   op0 = FMA (op1, op2, op3)
> >
> > The best regclass of them are VSX_REG, when op1,op2,op3 are all dead,
> > IRA simply creates three shuffle copies for them (here the operand
> > order matters, since with the same freq, the one with smaller number
> > takes preference), but IMO both op2 and op3 should take higher priority
> > in copy queue due to the matching constraint.
> >
> > I noticed that there is one function ira_get_dup_out_num, which meant
> > to create this kind of constraint copy, but the below code looks to
> > refuse to create if there is an alternative which has valid regclass
> > without spilled need.
> >
> >   default:
> >   {
> > enum constraint_num cn = lookup_constraint (str);
> > enum reg_class cl = reg_class_for_constraint (cn);
> > if (cl != NO_REGS
> > && !targetm.class_likely_spilled_p (cl))
> >   goto fail
> >
> >...
> >
> > I cooked one patch attached to make ira respect this kind of matching
> > constraint guarded with one parameter.  As I stated in the PR, I was
> > not sure this is on the right track.  The RFC patch is to check the
> > matching constraint in all alternatives, if there is one alternative
> > with matching constraint and matches the current preferred regclass
> > (or best of allocno?), it will record the output operand number and
> > further create one constraint copy for it.  Normally it can get the
> > priority against shuffle copies and the matching constraint will get
> > satisfied with higher possibility, reload doesn't create extra copies
> > to meet the matching constraint or the desirable register class when
> > it has to.
> >
> > For FMA A,B,C,D, I think ideally copies A/B, A/C, A/D can firstly stay
> > as shuffle copies, and later any of A,B,C,D gets assigned by one
> > hardware register which is a VSX register (VSX_REG) but not a FP
> > register (FLOAT_REG), which means it has to pay costs once we can NOT
> > go with VSX alternatives, so at that time it's important to respect
> > the matching constraint then we can increase the freq for the remaining
> > copies related to this (A/B, A/C, A/D).  This idea requires some side
> > tables to record some information and seems a bit complicated in the
> > current framework, so the proposed patch aggressively emphasizes the
> > matching constraint at the time of creating copies.
> >
>
> Comparing with the original patch (v1), this patch v3 has
> considered: (this should be v2 for this mail list, but bump
> it to be consistent as PR's).
>
>   - Excluding the case where for one preferred register class
> there can be two or more alternatives, one of them has the
> matching constraint, while another doesn't have.  So for
> the given operand, even if it's assigned by a hardware reg
> which doesn't meet the matching constraint, it can simply
> use the alternative which doesn't have matching constraint
> so no register move is needed.  One typical case is
> define_insn *mov_internal2 on rs6000.  So we
> shouldn't create constraint copy for it.
>
>   - The possible free register move in the same register class,
> disable this if so since the register move to meet the
> constraint is considered as free.
>
>   - Making it on by default, suggested by Segher & Vladimir, we
> hope to get rid of the parameter if the benchmarking result
> looks good on major targets.
>
>   - Tweaking cost when either of matching constraint two sides
> is hardware register.  Before this patch, the constraint
> copy is simply taken as a real move insn for pref and
> confl

Re: [RFC/PATCH v3] ira: Support more matching constraint forms with param [PR100328]

2021-06-28 Thread Hongtao Liu via Gcc-patches
On Mon, Jun 28, 2021 at 3:12 PM Hongtao Liu  wrote:
>
> On Mon, Jun 28, 2021 at 2:50 PM Kewen.Lin  wrote:
> >
> > Hi!
> >
> > on 2021/6/9 下午1:18, Kewen.Lin via Gcc-patches wrote:
> > > Hi,
> > >
> > > PR100328 has some details about this issue, I am trying to
> > > brief it here.  In the hottest function LBM_performStreamCollideTRT
> > > of SPEC2017 bmk 519.lbm_r, there are many FMA style expressions
> > > (27 FMA, 19 FMS, 11 FNMA).  On rs6000, this kind of FMA style
> > > insn has two flavors: FLOAT_REG and VSX_REG, the VSX_REG reg
> > > class have 64 registers whose foregoing 32 ones make up the
> > > whole FLOAT_REG.  There are some differences for these two
> > > flavors, taking "*fma4_fpr" as example:
> > >
> > > (define_insn "*fma4_fpr"
> > >   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=,wa,wa")
> > >   (fma:SFDF
> > > (match_operand:SFDF 1 "gpc_reg_operand" "%,wa,wa")
> > > (match_operand:SFDF 2 "gpc_reg_operand" ",wa,0")
> > > (match_operand:SFDF 3 "gpc_reg_operand" ",0,wa")))]
> > >
> > > // wa => A VSX register (VSR), vs0…vs63, aka. VSX_REG.
> > > //  (f/d) => A floating point register, aka. FLOAT_REG.
> > >
> > > So for VSX_REG, we only have the destructive form, when VSX_REG
> > > alternative being used, the operand 2 or operand 3 is required
> > > to be the same as operand 0.  reload has to take care of this
> > > constraint and create some non-free register copies if required.
> > >
> > > Assuming one fma insn looks like:
> > >   op0 = FMA (op1, op2, op3)
> > >
> > > The best regclass of them are VSX_REG, when op1,op2,op3 are all dead,
> > > IRA simply creates three shuffle copies for them (here the operand
> > > order matters, since with the same freq, the one with smaller number
> > > takes preference), but IMO both op2 and op3 should take higher priority
> > > in copy queue due to the matching constraint.
> > >
> > > I noticed that there is one function ira_get_dup_out_num, which meant
> > > to create this kind of constraint copy, but the below code looks to
> > > refuse to create if there is an alternative which has valid regclass
> > > without spilled need.
> > >
> > >   default:
> > >   {
> > > enum constraint_num cn = lookup_constraint (str);
> > > enum reg_class cl = reg_class_for_constraint (cn);
> > > if (cl != NO_REGS
> > > && !targetm.class_likely_spilled_p (cl))
> > >   goto fail
> > >
> > >...
> > >
> > > I cooked one patch attached to make ira respect this kind of matching
> > > constraint guarded with one parameter.  As I stated in the PR, I was
> > > not sure this is on the right track.  The RFC patch is to check the
> > > matching constraint in all alternatives, if there is one alternative
> > > with matching constraint and matches the current preferred regclass
> > > (or best of allocno?), it will record the output operand number and
> > > further create one constraint copy for it.  Normally it can get the
> > > priority against shuffle copies and the matching constraint will get
> > > satisfied with higher possibility, reload doesn't create extra copies
> > > to meet the matching constraint or the desirable register class when
> > > it has to.
> > >
> > > For FMA A,B,C,D, I think ideally copies A/B, A/C, A/D can firstly stay
> > > as shuffle copies, and later any of A,B,C,D gets assigned by one
> > > hardware register which is a VSX register (VSX_REG) but not a FP
> > > register (FLOAT_REG), which means it has to pay costs once we can NOT
> > > go with VSX alternatives, so at that time it's important to respect
> > > the matching constraint then we can increase the freq for the remaining
> > > copies related to this (A/B, A/C, A/D).  This idea requires some side
> > > tables to record some information and seems a bit complicated in the
> > > current framework, so the proposed patch aggressively emphasizes the
> > > matching constraint at the time of creating copies.
> > >
> >
> > Comparing with the original patch (v1), this patch v3 has
> > considered: (this should be v2 for this mail list, but bump
> > it to be consistent as PR's).
> >
> >   - Excluding the case where for one preferred register class
> > there can be two or more alternatives, one of them has the
> > matching constraint, while another doesn't have.  So for
> > the given operand, even if it's assigned by a hardware reg
> > which doesn't meet the matching constraint, it can simply
> > use the alternative which doesn't have matching constraint
> > so no register move is needed.  One typical case is
> > define_insn *mov_internal2 on rs6000.  So we
> > shouldn't create constraint copy for it.
> >
> >   - The possible free register move in the same register class,
> > disable this if so since the register move to meet the
> > constraint is considered as free.
> >
> >   - Making it on by default, suggested by Segher & Vladimir, we
> > hope to get rid of the parameter if the benc

Re: [PATCH] aix: handle 64bit inodes for include directories

2021-06-28 Thread CHIGOT, CLEMENT via Gcc-patches
>On 6/23/2021 12:53 AM, CHIGOT, CLEMENT via Gcc-patches wrote:
>> Hi David,
>>
>> Did you have a chance to take look at this patch ?
>>
>> Thanks,
>> Clément
>>
>>
>>> +DavidMalcolm
>>>
>>> Can you review this patch when you have a moment?
>>>
>>> Thanks, David
>>>
>>> On Mon, May 17, 2021 at 3:05 PM David Edelsohn  wrote:
 The aix.h change is okay with me, but you need to get approval for the
 incpath.c and cpplib.h parts of the patch from the appropriate
 maintainers.

 Thanks, David

 On Mon, May 17, 2021 at 7:44 AM CHIGOT, CLEMENT  
 wrote:
> On AIX, stat will store inodes in 32bit even when using LARGE_FILES.
> If the inode is larger, it will return -1 in st_ino.
> Thus, in incpath.c when comparing include directories, if several
> of them have 64bit inodes, they will be considered as duplicated.
>
> gcc/ChangeLog:
> 2021-05-06  Clément Chigot  
>
>  * configure.ac: Check sizeof ino_t and dev_t.
>  * config.in: Regenerate.
>  * configure: Regenerate.
>  * config/rs6000/aix.h (HOST_STAT_FOR_64BIT_INODES): New define.
>  * incpath.c (HOST_STAT_FOR_64BIT_INODES): New define.
>  (remove_duplicates): Use it.
>
> libcpp/ChangeLog:
> 2021-05-06  Clément Chigot  
>
>  * configure.ac: Check sizeof ino_t and dev_t.
>  * config.in: Regenerate.
>  * configure: Regenerate.
>  * include/cpplib.h (INO_T_CPP): Change for AIX.
>  (DEV_T_CPP): New macro.
>  (struct cpp_dir): Use it.
> So my worry here is this is really a host property -- ie, this is 
> behavior of where GCC runs, not the target for which GCC is generating code.
> 
> That implies that the change in aix.h is wrong.  aix.h is for the 
> target, not the host -- you don't want to define something like 
> HOST_STAT_FOR_64BIT_INODES there.
>
> You'd want to be triggering this behavior via a host fragment, x-aix, or 
> better yet via an autoconf test.

Indeed, would this version be better ? I'm not sure about the configure test. 
But as we are retrieving the size of dev_t and ino_t just above, I'm assuming 
that the one being used in stat directly. At least, that's the case on AIX, and 
this test is only made for AIX. 

Clément

0001-aix-handle-64bit-inodes-for-include-directories.patch
Description: 0001-aix-handle-64bit-inodes-for-include-directories.patch


Re: [RFC/PATCH v3] ira: Support more matching constraint forms with param [PR100328]

2021-06-28 Thread Kewen.Lin via Gcc-patches
on 2021/6/28 下午3:20, Hongtao Liu wrote:
> On Mon, Jun 28, 2021 at 3:12 PM Hongtao Liu  wrote:
>>
>> On Mon, Jun 28, 2021 at 2:50 PM Kewen.Lin  wrote:
>>>
>>> Hi!
>>>
>>> on 2021/6/9 下午1:18, Kewen.Lin via Gcc-patches wrote:
 Hi,

 PR100328 has some details about this issue, I am trying to
 brief it here.  In the hottest function LBM_performStreamCollideTRT
 of SPEC2017 bmk 519.lbm_r, there are many FMA style expressions
 (27 FMA, 19 FMS, 11 FNMA).  On rs6000, this kind of FMA style
 insn has two flavors: FLOAT_REG and VSX_REG, the VSX_REG reg
 class have 64 registers whose foregoing 32 ones make up the
 whole FLOAT_REG.  There are some differences for these two
 flavors, taking "*fma4_fpr" as example:

 (define_insn "*fma4_fpr"
   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=,wa,wa")
   (fma:SFDF
 (match_operand:SFDF 1 "gpc_reg_operand" "%,wa,wa")
 (match_operand:SFDF 2 "gpc_reg_operand" ",wa,0")
 (match_operand:SFDF 3 "gpc_reg_operand" ",0,wa")))]

 // wa => A VSX register (VSR), vs0…vs63, aka. VSX_REG.
 //  (f/d) => A floating point register, aka. FLOAT_REG.

 So for VSX_REG, we only have the destructive form, when VSX_REG
 alternative being used, the operand 2 or operand 3 is required
 to be the same as operand 0.  reload has to take care of this
 constraint and create some non-free register copies if required.

 Assuming one fma insn looks like:
   op0 = FMA (op1, op2, op3)

 The best regclass of them are VSX_REG, when op1,op2,op3 are all dead,
 IRA simply creates three shuffle copies for them (here the operand
 order matters, since with the same freq, the one with smaller number
 takes preference), but IMO both op2 and op3 should take higher priority
 in copy queue due to the matching constraint.

 I noticed that there is one function ira_get_dup_out_num, which meant
 to create this kind of constraint copy, but the below code looks to
 refuse to create if there is an alternative which has valid regclass
 without spilled need.

   default:
   {
 enum constraint_num cn = lookup_constraint (str);
 enum reg_class cl = reg_class_for_constraint (cn);
 if (cl != NO_REGS
 && !targetm.class_likely_spilled_p (cl))
   goto fail

...

 I cooked one patch attached to make ira respect this kind of matching
 constraint guarded with one parameter.  As I stated in the PR, I was
 not sure this is on the right track.  The RFC patch is to check the
 matching constraint in all alternatives, if there is one alternative
 with matching constraint and matches the current preferred regclass
 (or best of allocno?), it will record the output operand number and
 further create one constraint copy for it.  Normally it can get the
 priority against shuffle copies and the matching constraint will get
 satisfied with higher possibility, reload doesn't create extra copies
 to meet the matching constraint or the desirable register class when
 it has to.

 For FMA A,B,C,D, I think ideally copies A/B, A/C, A/D can firstly stay
 as shuffle copies, and later any of A,B,C,D gets assigned by one
 hardware register which is a VSX register (VSX_REG) but not a FP
 register (FLOAT_REG), which means it has to pay costs once we can NOT
 go with VSX alternatives, so at that time it's important to respect
 the matching constraint then we can increase the freq for the remaining
 copies related to this (A/B, A/C, A/D).  This idea requires some side
 tables to record some information and seems a bit complicated in the
 current framework, so the proposed patch aggressively emphasizes the
 matching constraint at the time of creating copies.

>>>
>>> Comparing with the original patch (v1), this patch v3 has
>>> considered: (this should be v2 for this mail list, but bump
>>> it to be consistent as PR's).
>>>
>>>   - Excluding the case where for one preferred register class
>>> there can be two or more alternatives, one of them has the
>>> matching constraint, while another doesn't have.  So for
>>> the given operand, even if it's assigned by a hardware reg
>>> which doesn't meet the matching constraint, it can simply
>>> use the alternative which doesn't have matching constraint
>>> so no register move is needed.  One typical case is
>>> define_insn *mov_internal2 on rs6000.  So we
>>> shouldn't create constraint copy for it.
>>>
>>>   - The possible free register move in the same register class,
>>> disable this if so since the register move to meet the
>>> constraint is considered as free.
>>>
>>>   - Making it on by default, suggested by Segher & Vladimir, we
>>> hope to get rid of the parameter if the benchmarking result
>>> looks

Re: [PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-06-28 Thread Richard Biener via Gcc-patches
On Fri, Jun 25, 2021 at 10:52 PM Martin Sebor  wrote:
>
> On 6/1/21 3:38 PM, Jason Merrill wrote:
> > On 6/1/21 3:56 PM, Martin Sebor wrote:
> >> On 5/27/21 2:53 PM, Jason Merrill wrote:
> >>> On 4/27/21 11:52 AM, Martin Sebor via Gcc-patches wrote:
>  On 4/27/21 8:04 AM, Richard Biener wrote:
> > On Tue, Apr 27, 2021 at 3:59 PM Martin Sebor  wrote:
> >>
> >> On 4/27/21 1:58 AM, Richard Biener wrote:
> >>> On Tue, Apr 27, 2021 at 2:46 AM Martin Sebor via Gcc-patches
> >>>  wrote:
> 
>  PR 90904 notes that auto_vec is unsafe to copy and assign because
>  the class manages its own memory but doesn't define (or delete)
>  either special function.  Since I first ran into the problem,
>  auto_vec has grown a move ctor and move assignment from
>  a dynamically-allocated vec but still no copy ctor or copy
>  assignment operator.
> 
>  The attached patch adds the two special functions to auto_vec along
>  with a few simple tests.  It makes auto_vec safe to use in
>  containers
>  that expect copyable and assignable element types and passes
>  bootstrap
>  and regression testing on x86_64-linux.
> >>>
> >>> The question is whether we want such uses to appear since those
> >>> can be quite inefficient?  Thus the option is to delete those
> >>> operators?
> >>
> >> I would strongly prefer the generic vector class to have the
> >> properties
> >> expected of any other generic container: copyable and assignable.  If
> >> we also want another vector type with this restriction I suggest
> >> to add
> >> another "noncopyable" type and make that property explicit in its
> >> name.
> >> I can submit one in a followup patch if you think we need one.
> >
> > I'm not sure (and not strictly against the copy and assign).
> > Looking around
> > I see that vec<> does not do deep copying.  Making auto_vec<> do it
> > might be surprising (I added the move capability to match how vec<>
> > is used - as "reference" to a vector)
> 
>  The vec base classes are special: they have no ctors at all (because
>  of their use in unions).  That's something we might have to live with
>  but it's not a model to follow in ordinary containers.
> >>>
> >>> I don't think we have to live with it anymore, now that we're writing
> >>> C++11.
> >>>
>  The auto_vec class was introduced to fill the need for a conventional
>  sequence container with a ctor and dtor.  The missing copy ctor and
>  assignment operators were an oversight, not a deliberate feature.
>  This change fixes that oversight.
> 
>  The revised patch also adds a copy ctor/assignment to the auto_vec
>  primary template (that's also missing it).  In addition, it adds
>  a new class called auto_vec_ncopy that disables copying and
>  assignment as you prefer.
> >>>
> >>> Hmm, adding another class doesn't really help with the confusion
> >>> richi mentions.  And many uses of auto_vec will pass them as vec,
> >>> which will still do a shallow copy.  I think it's probably better to
> >>> disable the copy special members for auto_vec until we fix vec<>.
> >>
> >> There are at least a couple of problems that get in the way of fixing
> >> all of vec to act like a well-behaved C++ container:
> >>
> >> 1) The embedded vec has a trailing "flexible" array member with its
> >> instances having different size.  They're initialized by memset and
> >> copied by memcpy.  The class can't have copy ctors or assignments
> >> but it should disable/delete them instead.
> >>
> >> 2) The heap-based vec is used throughout GCC with the assumption of
> >> shallow copy semantics (not just as function arguments but also as
> >> members of other such POD classes).  This can be changed by providing
> >> copy and move ctors and assignment operators for it, and also for
> >> some of the classes in which it's a member and that are used with
> >> the same assumption.
> >>
> >> 3) The heap-based vec::block_remove() assumes its elements are PODs.
> >> That breaks in VEC_ORDERED_REMOVE_IF (used in gcc/dwarf2cfi.c:2862
> >> and tree-vect-patterns.c).
> >>
> >> I took a stab at both and while (1) is easy, (2) is shaping up to
> >> be a big and tricky project.  Tricky because it involves using
> >> std::move in places where what's moved is subsequently still used.
> >> I can keep plugging away at it but it won't change the fact that
> >> the embedded and heap-based vecs have different requirements.
> >>
> >> It doesn't seem to me that having a safely copyable auto_vec needs
> >> to be put on hold until the rats nest above is untangled.  It won't
> >> make anything worse than it is.  (I have a project that depends on
> >> a sane auto_vec working).
> >>
> >> A couple of alternatives to solving this are to use std::vector or
> >> write an equivalent vector class just for GC

Re: [PATCH] New hook adjust_iv_update_pos

2021-06-28 Thread Xionghu Luo via Gcc-patches



On 2021/6/25 18:02, Richard Biener wrote:
> On Fri, Jun 25, 2021 at 11:41 AM Xionghu Luo  wrote:
>>
>>
>>
>> On 2021/6/25 16:54, Richard Biener wrote:
>>> On Fri, Jun 25, 2021 at 10:34 AM Xionghu Luo via Gcc-patches
>>>  wrote:

 From: Xiong Hu Luo 

 adjust_iv_update_pos in tree-ssa-loop-ivopts doesn't help performance
 on Power.  For example, it generates mismatched address offset after
 adjust iv update statement position:

  [local count: 70988443]:
 _84 = MEM[(uint8_t *)ip_229 + ivtmp.30_414 * 1];
 ivtmp.30_415 = ivtmp.30_414 + 1;
 _34 = ref_180 + 18446744073709551615;
 _86 = MEM[(uint8_t *)_34 + ivtmp.30_415 * 1];
 if (_84 == _86)
 goto ; [94.50%]
 else
 goto ; [5.50%]

 Disable it will produce:

  [local count: 70988443]:
 _84 = MEM[(uint8_t *)ip_229 + ivtmp.30_414 * 1];
 _86 = MEM[(uint8_t *)ref_180 + ivtmp.30_414 * 1];
 ivtmp.30_415 = ivtmp.30_414 + 1;
 if (_84 == _86)
 goto ; [94.50%]
 else
 goto ; [5.50%]

 Then later pass loop unroll could benefit from same address offset
 with different base address and reduces register dependency.
 This patch could improve performance by 10% for typical case on Power,
 no performance change observed for X86 or Aarch64 due to small loops
 not unrolled on these platforms.  Any comments?
>>>
>>> The case you quote is special in that if we hoisted the IV update before
>>> the other MEM _also_ used in the condition it would be fine again.
>>
>> Thanks.  I tried to hoist the IV update statement before the first MEM (Fix 
>> 2), it
>> shows even worse performance due to not unroll(two more "base-1" is 
>> generated in gimple,
>> then loop->ninsns is 11 so small loops is not unrolled), change the 
>> threshold from
>> 10 to 12 in rs6000_loop_unroll_adjust would make it also unroll 2 times, the
>> performance is SAME to the one that IV update statement in the *MIDDLE* 
>> (trunk).
>>  From the ASM, we can see the index register %r4 is used in two iterations 
>> which
>> maybe a bottle neck for hiding instruction latency?
>>
>> Then it seems reasonable the performance would be better if keep the IV 
>> update
>> statement at *LAST* (Fix 1).
>>
>> (Fix 2):
>> [local count: 70988443]:
>>ivtmp.30_415 = ivtmp.30_414 + 1;
>>_34 = ip_229 + 18446744073709551615;
>>_84 = MEM[(uint8_t *)_34 + ivtmp.30_415 * 1];
>>_33 = ref_180 + 18446744073709551615;
>>_86 = MEM[(uint8_t *)_33 + ivtmp.30_415 * 1];
>>if (_84 == _86)
>>  goto ; [94.50%]
>>else
>>  goto ; [5.50%]
>>
>>
>> .L67:
>>  lbzx %r12,%r24,%r4
>>  lbzx %r25,%r7,%r4
>>  cmpw %cr0,%r12,%r25
>>  bne %cr0,.L11
>>  mr %r26,%r4
>>  addi %r4,%r4,1
>>  lbzx %r12,%r24,%r4
>>  lbzx %r25,%r7,%r4
>>  mr %r6,%r26
>>  cmpw %cr0,%r12,%r25
>>  bne %cr0,.L11
>>  mr %r26,%r4
>> .L12:
>>  cmpdi %cr0,%r10,1
>>  addi %r4,%r26,1
>>  mr %r6,%r26
>>  addi %r10,%r10,-1
>>  bne %cr0,.L67
>>
>>>
>>> Now, adjust_iv_update_pos doesn't seem to check that the
>>> condition actually uses the IV use stmt def, so it likely applies to
>>> too many cases.
>>>
>>> Unfortunately the introducing rev didn't come with a testcase,
>>> but still I think fixing up adjust_iv_update_pos is better than
>>> introducing a way to short-cut it per target decision.
>>>
>>> One "fix" might be to add a check that either the condition
>>> lhs or rhs is the def of the IV use and the other operand
>>> is invariant.  Or if it's of similar structure hoist across the
>>> other iv-use as well.  Not that I understand the argument
>>> about the overlapping life-range.
>>>
>>> You also don't provide a complete testcase ...
>>>
>>
>> Attached the test code, will also add it it patch in future version.
>> The issue comes from a very small hot loop:
>>
>>  do {
>>len++;
>>  } while(len < maxlen && ip[len] == ref[len]);
> 
> unsigned int foo (unsigned char *ip, unsigned char *ref, unsigned int maxlen)
> {
>unsigned int len = 2;
>do {
>len++;
>}while(len < maxlen && ip[len] == ref[len]);
>return len;
> }
> 
> I can see the effect on this loop on x86_64 as well, we end up with
> 
> .L6:
>  movzbl  (%rdi,%rax), %ecx
>  addq$1, %rax
>  cmpb-1(%rsi,%rax), %cl
>  jne .L1
> .L3:
>  movl%eax, %r8d
>  cmpl%edx, %eax
>  jb  .L6
> 
> but without the trick it is
> 
> .L6:
>  movzbl  (%rdi,%rax), %r8d
>  movzbl  (%rsi,%rax), %ecx
>  addq$1, %rax
>  cmpb%cl, %r8b
>  jne .L1
> .L3:
>  movl%eax, %r9d
>  cmpl%edx, %eax
>  jb  .L6

Verified this small piece of code on X86, there is no performance
change with or without adjust_iv_update_pos (I've checked the ASM
ex

Re: [PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-06-28 Thread Richard Biener via Gcc-patches
On Sat, Jun 26, 2021 at 12:36 AM Martin Sebor  wrote:
>
> On 6/25/21 4:11 PM, Jason Merrill wrote:
> > On 6/25/21 4:51 PM, Martin Sebor wrote:
> >> On 6/1/21 3:38 PM, Jason Merrill wrote:
> >>> On 6/1/21 3:56 PM, Martin Sebor wrote:
>  On 5/27/21 2:53 PM, Jason Merrill wrote:
> > On 4/27/21 11:52 AM, Martin Sebor via Gcc-patches wrote:
> >> On 4/27/21 8:04 AM, Richard Biener wrote:
> >>> On Tue, Apr 27, 2021 at 3:59 PM Martin Sebor 
> >>> wrote:
> 
>  On 4/27/21 1:58 AM, Richard Biener wrote:
> > On Tue, Apr 27, 2021 at 2:46 AM Martin Sebor via Gcc-patches
> >  wrote:
> >>
> >> PR 90904 notes that auto_vec is unsafe to copy and assign because
> >> the class manages its own memory but doesn't define (or delete)
> >> either special function.  Since I first ran into the problem,
> >> auto_vec has grown a move ctor and move assignment from
> >> a dynamically-allocated vec but still no copy ctor or copy
> >> assignment operator.
> >>
> >> The attached patch adds the two special functions to auto_vec
> >> along
> >> with a few simple tests.  It makes auto_vec safe to use in
> >> containers
> >> that expect copyable and assignable element types and passes
> >> bootstrap
> >> and regression testing on x86_64-linux.
> >
> > The question is whether we want such uses to appear since those
> > can be quite inefficient?  Thus the option is to delete those
> > operators?
> 
>  I would strongly prefer the generic vector class to have the
>  properties
>  expected of any other generic container: copyable and
>  assignable.  If
>  we also want another vector type with this restriction I suggest
>  to add
>  another "noncopyable" type and make that property explicit in
>  its name.
>  I can submit one in a followup patch if you think we need one.
> >>>
> >>> I'm not sure (and not strictly against the copy and assign).
> >>> Looking around
> >>> I see that vec<> does not do deep copying.  Making auto_vec<> do it
> >>> might be surprising (I added the move capability to match how vec<>
> >>> is used - as "reference" to a vector)
> >>
> >> The vec base classes are special: they have no ctors at all (because
> >> of their use in unions).  That's something we might have to live with
> >> but it's not a model to follow in ordinary containers.
> >
> > I don't think we have to live with it anymore, now that we're
> > writing C++11.
> >
> >> The auto_vec class was introduced to fill the need for a conventional
> >> sequence container with a ctor and dtor.  The missing copy ctor and
> >> assignment operators were an oversight, not a deliberate feature.
> >> This change fixes that oversight.
> >>
> >> The revised patch also adds a copy ctor/assignment to the auto_vec
> >> primary template (that's also missing it).  In addition, it adds
> >> a new class called auto_vec_ncopy that disables copying and
> >> assignment as you prefer.
> >
> > Hmm, adding another class doesn't really help with the confusion
> > richi mentions.  And many uses of auto_vec will pass them as vec,
> > which will still do a shallow copy.  I think it's probably better
> > to disable the copy special members for auto_vec until we fix vec<>.
> 
>  There are at least a couple of problems that get in the way of fixing
>  all of vec to act like a well-behaved C++ container:
> 
>  1) The embedded vec has a trailing "flexible" array member with its
>  instances having different size.  They're initialized by memset and
>  copied by memcpy.  The class can't have copy ctors or assignments
>  but it should disable/delete them instead.
> 
>  2) The heap-based vec is used throughout GCC with the assumption of
>  shallow copy semantics (not just as function arguments but also as
>  members of other such POD classes).  This can be changed by providing
>  copy and move ctors and assignment operators for it, and also for
>  some of the classes in which it's a member and that are used with
>  the same assumption.
> 
>  3) The heap-based vec::block_remove() assumes its elements are PODs.
>  That breaks in VEC_ORDERED_REMOVE_IF (used in gcc/dwarf2cfi.c:2862
>  and tree-vect-patterns.c).
> 
>  I took a stab at both and while (1) is easy, (2) is shaping up to
>  be a big and tricky project.  Tricky because it involves using
>  std::move in places where what's moved is subsequently still used.
>  I can keep plugging away at it but it won't change the fact that
>  the embedded and heap-based vecs have different requirements.
> 
>  It doesn't seem to me that having a safely copyable auto_vec ne

Re: [PATCH] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2021-06-28 Thread Richard Biener via Gcc-patches
On Sun, Jun 27, 2021 at 5:46 PM Aldy Hernandez  wrote:
>
>
>
> On 6/25/21 9:38 AM, Richard Biener wrote:
> > On Thu, Jun 24, 2021 at 5:01 PM Andrew MacLeod  wrote:
> >>
> >> On 6/24/21 9:25 AM, Andrew MacLeod wrote:
> >>> On 6/24/21 8:29 AM, Richard Biener wrote:
> >>>
> >>>
> >>> THe original function in EVRP currently looks like:
> >>>
> >>>   === BB 2 
> >>>   :
> >>>  if (a_5(D) == b_6(D))
> >>>goto ; [INV]
> >>>  else
> >>>goto ; [INV]
> >>>
> >>> === BB 8 
> >>> Equivalence set : [a_5(D), b_6(D)] edge 2->8 provides
> >>> a_5 and b_6 as equivalences
> >>>   :
> >>>  goto ; [100.00%]
> >>>
> >>> === BB 6 
> >>>   :
> >>>  # i_1 = PHI <0(8), i_10(5)>
> >>>  if (i_1 < a_5(D))
> >>>goto ; [INV]
> >>>  else
> >>>goto ; [INV]
> >>>
> >>> === BB 3 
> >>> Relational : (i_1 < a_5(D)) edge 6->3 provides
> >>> this relation
> >>>   :
> >>>  if (i_1 == b_6(D))
> >>>goto ; [INV]
> >>>  else
> >>>goto ; [INV]
> >>>
> >>>
> >>> So It knows that a_5 and b_6 are equivalence, and it knows that i_1 <
> >>> a_5 in BB3 as well..
> >>>
> >>> so we should be able to indicate that  i_1 == b_6 as [0,0]..  we
> >>> currently aren't.   I think I had turned on equivalence mapping during
> >>> relational processing, so should be able to tag that without
> >>> transitive relations...  I'll have a look at why.
> >>>
> >>> And once we get a bit further along, you will be able to access this
> >>> without ranger.. if one wants to simply register the relations directly.
> >>>
> >>> Anyway, I'll get back to you why its currently being missed.
> >>>
> >>> Andrew
> >>>
> >>>
> >>>
> >> As promised.  There was a typo in the equivalency comparisons... so it
> >> was getting missed.  With the fix, the oracle identifies the relation
> >> and evrp will now fold that expression away and the IL becomes:
> >>
> >>  :
> >> if (a_5(D) == b_6(D))
> >>   goto ; [INV]
> >> else
> >>   goto ; [INV]
> >>
> >>  :
> >> i_10 = i_1 + 1;
> >>
> >>  :
> >> # i_1 = PHI <0(2), i_10(3)>
> >> if (i_1 < a_5(D))
> >>   goto ; [INV]
> >> else
> >>   goto ; [INV]
> >>
> >>  :
> >> return;
> >>
> >> for the other cases you quote, there are no predictions such that if a
> >> != 0 then this equivalency exists...
> >>
> >> +  if (a != 0)
> >> +{
> >> +  c = b;
> >> +}
> >>
> >> but the oracle would register that in the TRUE block,  c and b are
> >> equivalent... so some other pass that was interested in tracking
> >> conditions that make a block relevant would be able to compare relations...
> >
> > I guess to fully leverage optimizations for cases like
> >
> >if (a != 0)
> >  c = b;
> >...
> >if (a != 0)
> >  {
> >  if (c == b)
> > ...
> >  }
> >
> > one would need to consider the "optimally jump threaded path" to the
> > program point where the to be optimized stmt resides, making all
> > originally conditional but on the jump threaded path unconditional
> > relations and equivalences available.
> >
> > For VN that could be done by unwinding to the CFG merge after
> > the first if (a != 0), treating only one of the predecessor edges
> > as executable and registering the appropriate a != 0 result and
> > continue VN up to the desired point, committing to the result
> > until before the CFG merge after the second if (a != 0).  And then
> > unwinding again for the "else" path.  Sounds like a possible
> > explosion in complexity as well if second-order opportunities
> > arise.
> >
> > That is, we'd do simplifications exposed by jump threading but
> > without actually doing the jump threading (which will of course
> > not allow all possible simplifications w/o inserting extra PHIs
> > for computations we might want to re-use).
>
> FWIW, as I mention in the PR, if the upcoming threader work could be
> taught to use the relation oracle, it could easily solve the conditional
> flowing through the a!=0 path.  However, we wouldn't be able to thread
> it because in this particular case, the path crosses loop boundaries.
>
> I leave it to Jeff/others to pontificate on whether the jump-threader
> path duplicator could be taught to through loops. ??

If the path doesn't end outside of the loop then it will usually
create an alternate entry and thus turn the loop into an irreducible
region - that's quite a bad thing to do unless we somehow get a
high profitability hint and are late in the compilation.

Richard.

>
> Aldy
>


Re: [EXTERNAL] Re: rs6000: Fix typos in float128 ISA3.1 support

2021-06-28 Thread Kewen.Lin via Gcc-patches
on 2021/6/25 上午3:36, Segher Boessenkool wrote:
> On Thu, Jun 24, 2021 at 05:32:20PM +0800, Kewen.Lin wrote:
>> on 2021/6/24 上午12:58, Segher Boessenkool wrote:
>>> On Wed, Jun 23, 2021 at 12:17:07PM +0800, Kewen.Lin wrote:
>> +#ifdef FLOAT128_HW_INSNS_ISA3_1
>>  TFtype __floattikf (TItype_ppc)
>>__attribute__ ((__ifunc__ ("__floattikf_resolve")));
>
> I wonder if we now need TItype_ppc at all anymore, btw?

 Sorry that I don't quite follow this question.
>>>
>>> I thought it may do the same as just TItype now, but the ifunc stuff
>>> probably makes it different still :-)
>>
>> Ah, thanks for the clarification!  If I read it right, TItype is defined
>> with __attribute__ ((mode (TI))) while TItype_ppc is defined with 
>> __attribute__ ((__mode__ (__TI__))), the later writing looks special.
> 
> I managed to read things wrong, I thought there was some ifunc stuff in
> the definition of TItype_ppc.  Of course there is not, it is just
> setting the mode.
> 
> mode(__TI__) is just the more portable way of writing mode(TI), the
> latter will not work if something #define's TI (you cannot do that with
> __TI__, you are not allowed to by the C standard, in application code).
> 

Yeah, thanks for the note.  It looks better to update the generic
macro with this ppc style "__" writting and remove ppc one. :-)

One related bug PR101235 was just opened, I noticed the culprit commit
was backported to GCC11, is it OK to backport this fix to GCC 11 if
everything goes well in one more week?


BR,
Kewen


Re: [PATCH] New hook adjust_iv_update_pos

2021-06-28 Thread Richard Biener via Gcc-patches
On Mon, Jun 28, 2021 at 10:07 AM Xionghu Luo  wrote:
>
>
>
> On 2021/6/25 18:02, Richard Biener wrote:
> > On Fri, Jun 25, 2021 at 11:41 AM Xionghu Luo  wrote:
> >>
> >>
> >>
> >> On 2021/6/25 16:54, Richard Biener wrote:
> >>> On Fri, Jun 25, 2021 at 10:34 AM Xionghu Luo via Gcc-patches
> >>>  wrote:
> 
>  From: Xiong Hu Luo 
> 
>  adjust_iv_update_pos in tree-ssa-loop-ivopts doesn't help performance
>  on Power.  For example, it generates mismatched address offset after
>  adjust iv update statement position:
> 
>   [local count: 70988443]:
>  _84 = MEM[(uint8_t *)ip_229 + ivtmp.30_414 * 1];
>  ivtmp.30_415 = ivtmp.30_414 + 1;
>  _34 = ref_180 + 18446744073709551615;
>  _86 = MEM[(uint8_t *)_34 + ivtmp.30_415 * 1];
>  if (_84 == _86)
>  goto ; [94.50%]
>  else
>  goto ; [5.50%]
> 
>  Disable it will produce:
> 
>   [local count: 70988443]:
>  _84 = MEM[(uint8_t *)ip_229 + ivtmp.30_414 * 1];
>  _86 = MEM[(uint8_t *)ref_180 + ivtmp.30_414 * 1];
>  ivtmp.30_415 = ivtmp.30_414 + 1;
>  if (_84 == _86)
>  goto ; [94.50%]
>  else
>  goto ; [5.50%]
> 
>  Then later pass loop unroll could benefit from same address offset
>  with different base address and reduces register dependency.
>  This patch could improve performance by 10% for typical case on Power,
>  no performance change observed for X86 or Aarch64 due to small loops
>  not unrolled on these platforms.  Any comments?
> >>>
> >>> The case you quote is special in that if we hoisted the IV update before
> >>> the other MEM _also_ used in the condition it would be fine again.
> >>
> >> Thanks.  I tried to hoist the IV update statement before the first MEM 
> >> (Fix 2), it
> >> shows even worse performance due to not unroll(two more "base-1" is 
> >> generated in gimple,
> >> then loop->ninsns is 11 so small loops is not unrolled), change the 
> >> threshold from
> >> 10 to 12 in rs6000_loop_unroll_adjust would make it also unroll 2 times, 
> >> the
> >> performance is SAME to the one that IV update statement in the *MIDDLE* 
> >> (trunk).
> >>  From the ASM, we can see the index register %r4 is used in two iterations 
> >> which
> >> maybe a bottle neck for hiding instruction latency?
> >>
> >> Then it seems reasonable the performance would be better if keep the IV 
> >> update
> >> statement at *LAST* (Fix 1).
> >>
> >> (Fix 2):
> >> [local count: 70988443]:
> >>ivtmp.30_415 = ivtmp.30_414 + 1;
> >>_34 = ip_229 + 18446744073709551615;
> >>_84 = MEM[(uint8_t *)_34 + ivtmp.30_415 * 1];
> >>_33 = ref_180 + 18446744073709551615;
> >>_86 = MEM[(uint8_t *)_33 + ivtmp.30_415 * 1];
> >>if (_84 == _86)
> >>  goto ; [94.50%]
> >>else
> >>  goto ; [5.50%]
> >>
> >>
> >> .L67:
> >>  lbzx %r12,%r24,%r4
> >>  lbzx %r25,%r7,%r4
> >>  cmpw %cr0,%r12,%r25
> >>  bne %cr0,.L11
> >>  mr %r26,%r4
> >>  addi %r4,%r4,1
> >>  lbzx %r12,%r24,%r4
> >>  lbzx %r25,%r7,%r4
> >>  mr %r6,%r26
> >>  cmpw %cr0,%r12,%r25
> >>  bne %cr0,.L11
> >>  mr %r26,%r4
> >> .L12:
> >>  cmpdi %cr0,%r10,1
> >>  addi %r4,%r26,1
> >>  mr %r6,%r26
> >>  addi %r10,%r10,-1
> >>  bne %cr0,.L67
> >>
> >>>
> >>> Now, adjust_iv_update_pos doesn't seem to check that the
> >>> condition actually uses the IV use stmt def, so it likely applies to
> >>> too many cases.
> >>>
> >>> Unfortunately the introducing rev didn't come with a testcase,
> >>> but still I think fixing up adjust_iv_update_pos is better than
> >>> introducing a way to short-cut it per target decision.
> >>>
> >>> One "fix" might be to add a check that either the condition
> >>> lhs or rhs is the def of the IV use and the other operand
> >>> is invariant.  Or if it's of similar structure hoist across the
> >>> other iv-use as well.  Not that I understand the argument
> >>> about the overlapping life-range.
> >>>
> >>> You also don't provide a complete testcase ...
> >>>
> >>
> >> Attached the test code, will also add it it patch in future version.
> >> The issue comes from a very small hot loop:
> >>
> >>  do {
> >>len++;
> >>  } while(len < maxlen && ip[len] == ref[len]);
> >
> > unsigned int foo (unsigned char *ip, unsigned char *ref, unsigned int 
> > maxlen)
> > {
> >unsigned int len = 2;
> >do {
> >len++;
> >}while(len < maxlen && ip[len] == ref[len]);
> >return len;
> > }
> >
> > I can see the effect on this loop on x86_64 as well, we end up with
> >
> > .L6:
> >  movzbl  (%rdi,%rax), %ecx
> >  addq$1, %rax
> >  cmpb-1(%rsi,%rax), %cl
> >  jne .L1
> > .L3:
> >  movl%eax, %r8d
> >  cmpl%edx, %eax
> >  jb  .L6
> >
> > but without the trick it is
> >
> > .L6:
> >  movzbl  (%rdi,%r

[PATCH] tree-optimization/101207 - fix BB reduc permute elide with life stmts

2021-06-28 Thread Richard Biener
This fixes breakage of live lane extracts from permuted loads we elide
from BB reduction vectorization by handling the un-permuting the same
as in the regular eliding code - apply the reverse permute to
both the scalar stmts and the load permutation.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-06-28  Richard Biener  

PR tree-optimization/101207
* tree-vect-slp.c (vect_optimize_slp): Do BB reduction
permute eliding for load permutations properly.

* gcc.dg/vect/bb-slp-pr101207.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr101207.c | 25 ++
 gcc/tree-vect-slp.c | 88 +++--
 2 files changed, 71 insertions(+), 42 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr101207.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr101207.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101207.c
new file mode 100644
index 000..1f51d66a5fe
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101207.c
@@ -0,0 +1,25 @@
+/* { dg-do run } */
+/* { dg-additional-options "-ffast-math" } */
+
+#include "tree-vect.h"
+
+double a[2];
+double x, y;
+
+void __attribute__((noipa)) foo ()
+{
+  x = a[1] - a[0];
+  y = a[0] + a[1];
+}
+
+int main()
+{
+  check_vect ();
+
+  a[0] = 0.;
+  a[1] = 1.;
+  foo ();
+  if (x != 1. || y != 1.)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 17fe5f23c09..5401dbe4d5e 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -3921,6 +3921,52 @@ vect_optimize_slp (vec_info *vinfo)
}
 }
 
+  /* Elide any permutations at BB reduction roots.  */
+  if (is_a  (vinfo))
+{
+  for (slp_instance instance : vinfo->slp_instances)
+   {
+ if (SLP_INSTANCE_KIND (instance) != slp_inst_kind_bb_reduc)
+   continue;
+ slp_tree old = SLP_INSTANCE_TREE (instance);
+ if (SLP_TREE_CODE (old) == VEC_PERM_EXPR
+ && SLP_TREE_CHILDREN (old).length () == 1)
+   {
+ slp_tree child = SLP_TREE_CHILDREN (old)[0];
+ if (SLP_TREE_DEF_TYPE (child) == vect_external_def)
+   {
+ /* Preserve the special VEC_PERM we use to shield existing
+vector defs from the rest.  But make it a no-op.  */
+ unsigned i = 0;
+ for (std::pair &p
+  : SLP_TREE_LANE_PERMUTATION (old))
+   p.second = i++;
+   }
+ else
+   {
+ SLP_INSTANCE_TREE (instance) = child;
+ SLP_TREE_REF_COUNT (child)++;
+ vect_free_slp_tree (old);
+   }
+   }
+ else if (SLP_TREE_LOAD_PERMUTATION (old).exists ()
+  && SLP_TREE_REF_COUNT (old) == 1
+  && vertices[old->vertex].materialize)
+   {
+ /* ???  For loads the situation is more complex since
+we can't modify the permute in place in case the
+node is used multiple times.  In fact for loads this
+should be somehow handled in the propagation engine.  */
+ /* Apply the reverse permutation to our stmts.  */
+ int perm = vertices[old->vertex].get_perm_in ();
+ vect_slp_permute (perms[perm],
+   SLP_TREE_SCALAR_STMTS (old), true);
+ vect_slp_permute (perms[perm],
+   SLP_TREE_LOAD_PERMUTATION (old), true);
+   }
+   }
+}
+
   /* Free the perms vector used for propagation.  */
   while (!perms.is_empty ())
 perms.pop ().release ();
@@ -3987,48 +4033,6 @@ vect_optimize_slp (vec_info *vinfo)
}
}
 }
-
-  /* And any permutations of BB reductions.  */
-  if (is_a  (vinfo))
-{
-  for (slp_instance instance : vinfo->slp_instances)
-   {
- if (SLP_INSTANCE_KIND (instance) != slp_inst_kind_bb_reduc)
-   continue;
- slp_tree old = SLP_INSTANCE_TREE (instance);
- if (SLP_TREE_CODE (old) == VEC_PERM_EXPR
- && SLP_TREE_CHILDREN (old).length () == 1)
-   {
- slp_tree child = SLP_TREE_CHILDREN (old)[0];
- if (SLP_TREE_DEF_TYPE (child) == vect_external_def)
-   {
- /* Preserve the special VEC_PERM we use to shield existing
-vector defs from the rest.  But make it a no-op.  */
- unsigned i = 0;
- for (std::pair &p
-  : SLP_TREE_LANE_PERMUTATION (old))
-   p.second = i++;
-   }
- else
-   {
- SLP_INSTANCE_TREE (instance) = child;
- SLP_TREE_REF_COUNT (child)++;
- vect_free_slp_tree (old);
-   }
-   }
- else if (SLP_TREE_LOAD_PERMUTATION (old).exists ()
-  && SLP_TREE_REF_COUN

Re: [ARM] PR98435: Missed optimization in expanding vector constructor

2021-06-28 Thread Prathamesh Kulkarni via Gcc-patches
On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov  wrote:
>
>
>
> > -Original Message-
> > From: Prathamesh Kulkarni 
> > Sent: 14 June 2021 09:02
> > To: Christophe Lyon 
> > Cc: gcc Patches ; Kyrylo Tkachov
> > 
> > Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
> > constructor
> >
> > On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Fri, 4 Jun 2021 at 13:15, Christophe Lyon 
> > wrote:
> > > >
> > > > On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-patches
> > > >  wrote:
> > > > >
> > > > > Hi,
> > > > > As mentioned in PR, for the following test-case:
> > > > >
> > > > > #include 
> > > > >
> > > > > bfloat16x4_t f1 (bfloat16_t a)
> > > > > {
> > > > >   return vdup_n_bf16 (a);
> > > > > }
> > > > >
> > > > > bfloat16x4_t f2 (bfloat16_t a)
> > > > > {
> > > > >   return (bfloat16x4_t) {a, a, a, a};
> > > > > }
> > > > >
> > > > > Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat-abi=softfp
> > > > > -march=armv8.2-a+bf16+fp16 results in f2 not being vectorized:
> > > > >
> > > > > f1:
> > > > > vdup.16 d16, r0
> > > > > vmovr0, r1, d16  @ v4bf
> > > > > bx  lr
> > > > >
> > > > > f2:
> > > > > mov r3, r0  @ __bf16
> > > > > adr r1, .L4
> > > > > ldrdr0, [r1]
> > > > > mov r2, r3  @ __bf16
> > > > > mov ip, r3  @ __bf16
> > > > > bfi r1, r2, #0, #16
> > > > > bfi r0, ip, #0, #16
> > > > > bfi r1, r3, #16, #16
> > > > > bfi r0, r2, #16, #16
> > > > > bx  lr
> > > > >
> > > > > This seems to happen because vec_init pattern in neon.md has VDQ
> > mode
> > > > > iterator, which doesn't include V4BF. In attached patch, I changed
> > > > > mode
> > > > > to VDQX which seems to work for the test-case, and the compiler now
> > generates:
> > > > >
> > > > > f2:
> > > > > vdup.16 d16, r0
> > > > > vmovr0, r1, d16  @ v4bf
> > > > > bx  lr
> > > > >
> > > > > However, the pattern is also gated on TARGET_HAVE_MVE and I am
> > not
> > > > > sure if either VDQ or VDQX are correct modes for MVE since MVE has
> > > > > only 128-bit vectors ?
> > > > >
> > > >
> > > > I think patterns common to both Neon and MVE should be moved to
> > > > vec-common.md, I don't know why such patterns were left in neon.md.
> > > Since we end up calling neon_expand_vector_init for both NEON and MVE,
> > > I am not sure if we should separate the pattern ?
> > > Would it make sense to FAIL if the mode size isn't 16 bytes for MVE as
> > > in attached patch so
> > > it will call neon_expand_vector_init only for 128-bit vectors ?
> > > Altho hard-coding 16 in the pattern doesn't seem a good idea to me either.
> > ping https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572342.html
> > (attaching patch as text).
> >
>
> --- a/gcc/config/arm/neon.md
> +++ b/gcc/config/arm/neon.md
> @@ -459,10 +459,12 @@
>  )
>
>  (define_expand "vec_init"
> -  [(match_operand:VDQ 0 "s_register_operand")
> +  [(match_operand:VDQX 0 "s_register_operand")
> (match_operand 1 "" "")]
>"TARGET_NEON || TARGET_HAVE_MVE"
>  {
> +  if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE (operands[0])) != 16)
> +FAIL;
>neon_expand_vector_init (operands[0], operands[1]);
>DONE;
>  })
>
> I think we should move this to vec-common.md like Christophe said.
> Perhaps rather than making it FAIL for non-16 MVE sizes we just disable it in 
> the expander condition?
> "TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (< VDQ>mode) != 16)"
Is it OK to use mode ? Because using mode resulted in lot
of build errors.
Also, I think the comparison should be inverted, ie, GET_MODE_SIZE
(mode) == 16 since
we want to make the pattern pass if target is MVE and vector size is 16 bytes ?
Do these changes in attached patch look OK ?

Thanks,
Prathamesh
>
> Thanks,
> Kyrill
>
> > Thanks,
> > Prathamesh
> > >
> > > Thanks,
> > > Prathamesh
> > > >
> > > > That being said, I suggest you look at other similar patterns in
> > > > vec-common.md, most of which are gated on
> > > > ARM_HAVE__ARITH
> > > > and possibly beware of issues with iwmmxt :-)
> > > >
> > > > Christophe
> > > >
> > > > > Thanks,
> > > > > Prathamesh


pr98435-3.diff
Description: Binary data


Re: [PATCH 1/4] Duplicate the range information of the phi onto the new ssa_name

2021-06-28 Thread Richard Biener via Gcc-patches
On Mon, Jun 28, 2021 at 1:26 AM apinski--- via Gcc-patches
 wrote:
>
> From: Andrew Pinski 
>
> Since match_simplify_replacement uses gimple_simplify, there is a new
> ssa name created sometimes and then we go and replace the phi edge with
> this new ssa name, the range information on the phi is lost.
> Placing this in replace_phi_edge_with_variable is the best option instead
> of doing it in each time replace_phi_edge_with_variable is called which is
> what is done today.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

> gcc/ChangeLog:
>
> * tree-ssa-phiopt.c (replace_phi_edge_with_variable): Duplicate range
> info if we're the only things setting the target PHI.
> (value_replacement): Don't duplicate range here.
> (minmax_replacement): Likewise.
> ---
>  gcc/tree-ssa-phiopt.c | 43 ++-
>  1 file changed, 26 insertions(+), 17 deletions(-)
>
> diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
> index 1777bff2f7c..ab12e85569d 100644
> --- a/gcc/tree-ssa-phiopt.c
> +++ b/gcc/tree-ssa-phiopt.c
> @@ -391,6 +391,32 @@ replace_phi_edge_with_variable (basic_block cond_block,
>basic_block bb = gimple_bb (phi);
>basic_block block_to_remove;
>gimple_stmt_iterator gsi;
> +  tree phi_result = PHI_RESULT (phi);
> +
> +  /* Duplicate range info if we're the only things setting the target PHI.
> + This is needed as later on, the new_tree will be replacing
> + The assignement of the PHI.
> + For an example:
> + bb1:
> + _4 = min
> + goto bb2
> +
> + range<-INF,255>
> + a_3 = PHI<_4(1)>
> + bb3:
> +
> + use(a_3)
> + And _4 gets prograted into the use of a_3 and losing the range info.
> + This can't be done for more than 2 incoming edges as the progration
> + won't happen.  */
> +  if (TREE_CODE (new_tree) == SSA_NAME
> +  && EDGE_COUNT (gimple_bb (phi)->preds) == 2
> +  && INTEGRAL_TYPE_P (TREE_TYPE (phi_result))
> +  && !SSA_NAME_RANGE_INFO (new_tree)
> +  && SSA_NAME_RANGE_INFO (phi_result))
> +duplicate_ssa_name_range_info (new_tree,
> +  SSA_NAME_RANGE_TYPE (phi_result),
> +  SSA_NAME_RANGE_INFO (phi_result));
>
>/* Change the PHI argument to new.  */
>SET_USE (PHI_ARG_DEF_PTR (phi, e->dest_idx), new_tree);
> @@ -1385,16 +1411,6 @@ value_replacement (basic_block cond_bb, basic_block 
> middle_bb,
>:
># u_3 = PHI   */
>reset_flow_sensitive_info (lhs);
> -  if (INTEGRAL_TYPE_P (TREE_TYPE (lhs)))
> -   {
> - /* If available, we can use VR of phi result at least.  */
> - tree phires = gimple_phi_result (phi);
> - struct range_info_def *phires_range_info
> -   = SSA_NAME_RANGE_INFO (phires);
> - if (phires_range_info)
> -   duplicate_ssa_name_range_info (lhs, SSA_NAME_RANGE_TYPE (phires),
> -  phires_range_info);
> -   }
>gimple_stmt_iterator gsi_from;
>for (int i = prep_cnt - 1; i >= 0; --i)
> {
> @@ -1794,13 +1810,6 @@ minmax_replacement (basic_block cond_bb, basic_block 
> middle_bb,
>gimple_seq stmts = NULL;
>tree phi_result = PHI_RESULT (phi);
>result = gimple_build (&stmts, minmax, TREE_TYPE (phi_result), arg0, arg1);
> -  /* Duplicate range info if we're the only things setting the target PHI.  
> */
> -  if (!gimple_seq_empty_p (stmts)
> -  && EDGE_COUNT (gimple_bb (phi)->preds) == 2
> -  && !POINTER_TYPE_P (TREE_TYPE (phi_result))
> -  && SSA_NAME_RANGE_INFO (phi_result))
> -duplicate_ssa_name_range_info (result, SSA_NAME_RANGE_TYPE (phi_result),
> -  SSA_NAME_RANGE_INFO (phi_result));
>
>gsi = gsi_last_bb (cond_bb);
>gsi_insert_seq_before (&gsi, stmts, GSI_NEW_STMT);
> --
> 2.27.0
>


Re: [PATCH 2/4] Allow match-and-simplified phiopt to run in early phiopt

2021-06-28 Thread Richard Biener via Gcc-patches
On Mon, Jun 28, 2021 at 1:27 AM apinski--- via Gcc-patches
 wrote:
>
> From: Andrew Pinski 
>
> To move a few things more to match-and-simplify from phiopt,
> we need to allow match_simplify_replacement to run in early
> phiopt. To do this we add a replacement for gimple_simplify
> that is explictly for phiopt.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no
> regressions.

OK.

> gcc/ChangeLog:
>
> * tree-ssa-phiopt.c (match_simplify_replacement):
> Add early_p argument. Call gimple_simplify_phiopt
> instead of gimple_simplify.
> (tree_ssa_phiopt_worker): Update call to
> match_simplify_replacement and allow unconditionally.
> (phiopt_early_allow): New function.
> (gimple_simplify_phiopt): New function.
> ---
>  gcc/tree-ssa-phiopt.c | 89 ++-
>  1 file changed, 70 insertions(+), 19 deletions(-)
>
> diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
> index ab12e85569d..17bc597851b 100644
> --- a/gcc/tree-ssa-phiopt.c
> +++ b/gcc/tree-ssa-phiopt.c
> @@ -50,12 +50,13 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gimple-fold.h"
>  #include "internal-fn.h"
>  #include "gimple-range.h"
> +#include "gimple-match.h"
>
>  static unsigned int tree_ssa_phiopt_worker (bool, bool, bool);
>  static bool two_value_replacement (basic_block, basic_block, edge, gphi *,
>tree, tree);
>  static bool match_simplify_replacement (basic_block, basic_block,
> -   edge, edge, gphi *, tree, tree);
> +   edge, edge, gphi *, tree, tree, bool);
>  static gphi *factor_out_conditional_conversion (edge, edge, gphi *, tree, 
> tree,
> gimple *);
>  static int value_replacement (basic_block, basic_block,
> @@ -345,9 +346,9 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
> do_hoist_loads, bool early_p)
>   /* Do the replacement of conditional if it can be done.  */
>   if (!early_p && two_value_replacement (bb, bb1, e2, phi, arg0, 
> arg1))
> cfgchanged = true;
> - else if (!early_p
> -  && match_simplify_replacement (bb, bb1, e1, e2, phi,
> - arg0, arg1))
> + else if (match_simplify_replacement (bb, bb1, e1, e2, phi,
> +  arg0, arg1,
> +  early_p))
> cfgchanged = true;
>   else if (abs_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
> cfgchanged = true;
> @@ -811,6 +812,67 @@ two_value_replacement (basic_block cond_bb, basic_block 
> middle_bb,
>return true;
>  }
>
> +/* Return TRUE if CODE should be allowed during early phiopt.
> +   Currently this is to allow MIN/MAX and ABS/NEGATE.  */
> +static bool
> +phiopt_early_allow (enum tree_code code)
> +{
> +  switch (code)
> +{
> +  case MIN_EXPR:
> +  case MAX_EXPR:
> +  case ABS_EXPR:
> +  case ABSU_EXPR:
> +  case NEGATE_EXPR:
> +  case SSA_NAME:
> +   return true;
> +  default:
> +   return false;
> +}
> +}
> +
> +/* gimple_simplify_phiopt is like gimple_simplify but designed for PHIOPT.
> +   Return NULL if nothing can be simplified or the resulting simplified value
> +   with parts pushed if EARLY_P was true. Also rejects non allowed tree code
> +   if EARLY_P is set.
> +   Takes the comparison from COMP_STMT and two args, ARG0 and ARG1 and tries
> +   to simplify CMP ? ARG0 : ARG1.  */
> +static tree
> +gimple_simplify_phiopt (bool early_p, tree type, gimple *comp_stmt,
> +   tree arg0, tree arg1,
> +   gimple_seq *seq)
> +{
> +  tree result;
> +  enum tree_code comp_code = gimple_cond_code (comp_stmt);
> +  location_t loc = gimple_location (comp_stmt);
> +  tree cmp0 = gimple_cond_lhs (comp_stmt);
> +  tree cmp1 = gimple_cond_rhs (comp_stmt);
> +  /* To handle special cases like floating point comparison, it is easier and
> + less error-prone to build a tree and gimplify it on the fly though it is
> + less efficient.
> + Don't use fold_build2 here as that might create (bool)a instead of just
> + "a != 0".  */
> +  tree cond = build2_loc (loc, comp_code, boolean_type_node,
> + cmp0, cmp1);
> +  gimple_match_op op (gimple_match_cond::UNCOND,
> + COND_EXPR, type, cond, arg0, arg1);
> +
> +  if (op.resimplify (early_p ? NULL : seq, follow_all_ssa_edges))
> +{
> +  /* Early we want only to allow some generated tree codes. */
> +  if (!early_p
> + || op.code.is_tree_code ()
> + || phiopt_early_allow ((tree_code)op.code))
> +   {
> + result = maybe_push_res_to_seq (&op, seq);
> + if (result)
> +   return result;
> +   }
> +}
> +
> +  return NULL;
> +}
> +
>  /*  The 

RE: [ARM] PR98435: Missed optimization in expanding vector constructor

2021-06-28 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Prathamesh Kulkarni 
> Sent: 28 June 2021 09:38
> To: Kyrylo Tkachov 
> Cc: Christophe Lyon ; gcc Patches  patc...@gcc.gnu.org>
> Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
> constructor
> 
> On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov 
> wrote:
> >
> >
> >
> > > -Original Message-
> > > From: Prathamesh Kulkarni 
> > > Sent: 14 June 2021 09:02
> > > To: Christophe Lyon 
> > > Cc: gcc Patches ; Kyrylo Tkachov
> > > 
> > > Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
> > > constructor
> > >
> > > On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Fri, 4 Jun 2021 at 13:15, Christophe Lyon
> 
> > > wrote:
> > > > >
> > > > > On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-patches
> > > > >  wrote:
> > > > > >
> > > > > > Hi,
> > > > > > As mentioned in PR, for the following test-case:
> > > > > >
> > > > > > #include 
> > > > > >
> > > > > > bfloat16x4_t f1 (bfloat16_t a)
> > > > > > {
> > > > > >   return vdup_n_bf16 (a);
> > > > > > }
> > > > > >
> > > > > > bfloat16x4_t f2 (bfloat16_t a)
> > > > > > {
> > > > > >   return (bfloat16x4_t) {a, a, a, a};
> > > > > > }
> > > > > >
> > > > > > Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat-
> abi=softfp
> > > > > > -march=armv8.2-a+bf16+fp16 results in f2 not being vectorized:
> > > > > >
> > > > > > f1:
> > > > > > vdup.16 d16, r0
> > > > > > vmovr0, r1, d16  @ v4bf
> > > > > > bx  lr
> > > > > >
> > > > > > f2:
> > > > > > mov r3, r0  @ __bf16
> > > > > > adr r1, .L4
> > > > > > ldrdr0, [r1]
> > > > > > mov r2, r3  @ __bf16
> > > > > > mov ip, r3  @ __bf16
> > > > > > bfi r1, r2, #0, #16
> > > > > > bfi r0, ip, #0, #16
> > > > > > bfi r1, r3, #16, #16
> > > > > > bfi r0, r2, #16, #16
> > > > > > bx  lr
> > > > > >
> > > > > > This seems to happen because vec_init pattern in neon.md has VDQ
> > > mode
> > > > > > iterator, which doesn't include V4BF. In attached patch, I changed
> > > > > > mode
> > > > > > to VDQX which seems to work for the test-case, and the compiler
> now
> > > generates:
> > > > > >
> > > > > > f2:
> > > > > > vdup.16 d16, r0
> > > > > > vmovr0, r1, d16  @ v4bf
> > > > > > bx  lr
> > > > > >
> > > > > > However, the pattern is also gated on TARGET_HAVE_MVE and I am
> > > not
> > > > > > sure if either VDQ or VDQX are correct modes for MVE since MVE
> has
> > > > > > only 128-bit vectors ?
> > > > > >
> > > > >
> > > > > I think patterns common to both Neon and MVE should be moved to
> > > > > vec-common.md, I don't know why such patterns were left in
> neon.md.
> > > > Since we end up calling neon_expand_vector_init for both NEON and
> MVE,
> > > > I am not sure if we should separate the pattern ?
> > > > Would it make sense to FAIL if the mode size isn't 16 bytes for MVE as
> > > > in attached patch so
> > > > it will call neon_expand_vector_init only for 128-bit vectors ?
> > > > Altho hard-coding 16 in the pattern doesn't seem a good idea to me
> either.
> > > ping https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572342.html
> > > (attaching patch as text).
> > >
> >
> > --- a/gcc/config/arm/neon.md
> > +++ b/gcc/config/arm/neon.md
> > @@ -459,10 +459,12 @@
> >  )
> >
> >  (define_expand "vec_init"
> > -  [(match_operand:VDQ 0 "s_register_operand")
> > +  [(match_operand:VDQX 0 "s_register_operand")
> > (match_operand 1 "" "")]
> >"TARGET_NEON || TARGET_HAVE_MVE"
> >  {
> > +  if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE
> (operands[0])) != 16)
> > +FAIL;
> >neon_expand_vector_init (operands[0], operands[1]);
> >DONE;
> >  })
> >
> > I think we should move this to vec-common.md like Christophe said.
> > Perhaps rather than making it FAIL for non-16 MVE sizes we just disable it 
> > in
> the expander condition?
> > "TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (<
> VDQ>mode) != 16)"
> Is it OK to use mode ? Because using mode resulted in lot
> of build errors.
> Also, I think the comparison should be inverted, ie, GET_MODE_SIZE
> (mode) == 16 since
> we want to make the pattern pass if target is MVE and vector size is 16 bytes 
> ?
> Do these changes in attached patch look OK ?

Yes, you're right.
Ok.
Thanks,
Kyrill


> 
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Kyrill
> >
> > > Thanks,
> > > Prathamesh
> > > >
> > > > Thanks,
> > > > Prathamesh
> > > > >
> > > > > That being said, I suggest you look at other similar patterns in
> > > > > vec-common.md, most of which are gated on
> > > > > ARM_HAVE__ARITH
> > > > > and possibly beware of issues with iwmmxt :-)
> > > > >
> > > > > Christophe
> > > > >
> > > > > > Thanks,
> > > > > > Prathamesh


Re: [PATCH 3/4] Try inverted comparison for match_simplify in phiopt

2021-06-28 Thread Richard Biener via Gcc-patches
On Mon, Jun 28, 2021 at 1:28 AM apinski--- via Gcc-patches
 wrote:
>
> From: Andrew Pinski 
>
> Since match and simplify does not have all of the inverted
> comparison patterns, it make sense to just have
> phi-opt try to do the inversion and try match and simplify again.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu.

OK with the comment fix suggested by Bernhard.  I think
if the match fails you have to somewhere discard the
sequence, since it can still end up with partly pushed
stmts - otherwise you'll slowly leak SSA names.  Theres
gimple_seq_discard () for this.  It's probably best to do it
in gimple_simplify_phiopt after each failed try.

Richard.

> Thanks,
> Andrew Pinski
>
> gcc/ChangeLog:
>
> * tree-ssa-phiopt.c (gimple_simplify_phiopt):
> If "A ? B : C" fails to simplify, try "(!A) ? C : B".
> ---
>  gcc/tree-ssa-phiopt.c | 27 ++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
> index 17bc597851b..9bda1b2a397 100644
> --- a/gcc/tree-ssa-phiopt.c
> +++ b/gcc/tree-ssa-phiopt.c
> @@ -836,7 +836,8 @@ phiopt_early_allow (enum tree_code code)
> with parts pushed if EARLY_P was true. Also rejects non allowed tree code
> if EARLY_P is set.
> Takes the comparison from COMP_STMT and two args, ARG0 and ARG1 and tries
> -   to simplify CMP ? ARG0 : ARG1.  */
> +   to simplify CMP ? ARG0 : ARG1.
> +   Also try to simplify (!CMP) ? ARG0 : ARG1 if the non-inverse failed.  */
>  static tree
>  gimple_simplify_phiopt (bool early_p, tree type, gimple *comp_stmt,
> tree arg0, tree arg1,
> @@ -869,6 +870,30 @@ gimple_simplify_phiopt (bool early_p, tree type, gimple 
> *comp_stmt,
> return result;
> }
>  }
> +  /* Try the inverted comparison, that is !COMP ? ARG1 : ARG0. */
> +  comp_code = invert_tree_comparison (comp_code, HONOR_NANS (cmp0));
> +
> +  if (comp_code == ERROR_MARK)
> +return NULL;
> +
> +  cond = build2_loc (loc,
> +comp_code, boolean_type_node,
> +cmp0, cmp1);
> +  gimple_match_op op1 (gimple_match_cond::UNCOND,
> +  COND_EXPR, type, cond, arg1, arg0);
> +
> +  if (op1.resimplify (early_p ? NULL : seq, follow_all_ssa_edges))
> +{
> +  /* Early we want only to allow some generated tree codes. */
> +  if (!early_p
> + || op1.code.is_tree_code ()
> + || phiopt_early_allow ((tree_code)op1.code))
> +   {
> + result = maybe_push_res_to_seq (&op1, seq);
> + if (result)
> +   return result;
> +   }
> +}
>
>return NULL;
>  }
> --
> 2.27.0
>


Re: [ARM] PR98435: Missed optimization in expanding vector constructor

2021-06-28 Thread Christophe LYON via Gcc-patches



On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches wrote:



-Original Message-
From: Prathamesh Kulkarni 
Sent: 28 June 2021 09:38
To: Kyrylo Tkachov 
Cc: Christophe Lyon ; gcc Patches 
Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
constructor

On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov 
wrote:




-Original Message-
From: Prathamesh Kulkarni 
Sent: 14 June 2021 09:02
To: Christophe Lyon 
Cc: gcc Patches ; Kyrylo Tkachov

Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
constructor

On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
 wrote:

On Fri, 4 Jun 2021 at 13:15, Christophe Lyon



wrote:

On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-patches
 wrote:

Hi,
As mentioned in PR, for the following test-case:

#include 

bfloat16x4_t f1 (bfloat16_t a)
{
   return vdup_n_bf16 (a);
}

bfloat16x4_t f2 (bfloat16_t a)
{
   return (bfloat16x4_t) {a, a, a, a};
}

Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat-

abi=softfp

-march=armv8.2-a+bf16+fp16 results in f2 not being vectorized:

f1:
 vdup.16 d16, r0
 vmovr0, r1, d16  @ v4bf
 bx  lr

f2:
 mov r3, r0  @ __bf16
 adr r1, .L4
 ldrdr0, [r1]
 mov r2, r3  @ __bf16
 mov ip, r3  @ __bf16
 bfi r1, r2, #0, #16
 bfi r0, ip, #0, #16
 bfi r1, r3, #16, #16
 bfi r0, r2, #16, #16
 bx  lr

This seems to happen because vec_init pattern in neon.md has VDQ

mode

iterator, which doesn't include V4BF. In attached patch, I changed
mode
to VDQX which seems to work for the test-case, and the compiler

now

generates:

f2:
 vdup.16 d16, r0
 vmovr0, r1, d16  @ v4bf
 bx  lr

However, the pattern is also gated on TARGET_HAVE_MVE and I am

not

sure if either VDQ or VDQX are correct modes for MVE since MVE

has

only 128-bit vectors ?


I think patterns common to both Neon and MVE should be moved to
vec-common.md, I don't know why such patterns were left in

neon.md.

Since we end up calling neon_expand_vector_init for both NEON and

MVE,

I am not sure if we should separate the pattern ?
Would it make sense to FAIL if the mode size isn't 16 bytes for MVE as
in attached patch so
it will call neon_expand_vector_init only for 128-bit vectors ?
Altho hard-coding 16 in the pattern doesn't seem a good idea to me

either.

ping https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572342.html
(attaching patch as text).


--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -459,10 +459,12 @@
  )

  (define_expand "vec_init"
-  [(match_operand:VDQ 0 "s_register_operand")
+  [(match_operand:VDQX 0 "s_register_operand")
 (match_operand 1 "" "")]
"TARGET_NEON || TARGET_HAVE_MVE"
  {
+  if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE

(operands[0])) != 16)

+FAIL;
neon_expand_vector_init (operands[0], operands[1]);
DONE;
  })

I think we should move this to vec-common.md like Christophe said.
Perhaps rather than making it FAIL for non-16 MVE sizes we just disable it in

the expander condition?

"TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (<

VDQ>mode) != 16)"
Is it OK to use mode ? Because using mode resulted in lot
of build errors.
Also, I think the comparison should be inverted, ie, GET_MODE_SIZE
(mode) == 16 since
we want to make the pattern pass if target is MVE and vector size is 16 bytes ?
Do these changes in attached patch look OK ?

Yes, you're right.



Can't this be ARM_HAVE__ARITH like in most expanders in vec-common.md?

(maybe with a && !TARGET_REALLY_IWMMXT if needed)


Christophe



Ok.
Thanks,
Kyrill



Thanks,
Prathamesh

Thanks,
Kyrill


Thanks,
Prathamesh

Thanks,
Prathamesh

That being said, I suggest you look at other similar patterns in
vec-common.md, most of which are gated on
ARM_HAVE__ARITH
and possibly beware of issues with iwmmxt :-)

Christophe


Thanks,
Prathamesh


Re: [PATCH 4/4] Port most of the A CMP 0 ? A : -A to match

2021-06-28 Thread Richard Biener via Gcc-patches
On Mon, Jun 28, 2021 at 1:29 AM apinski--- via Gcc-patches
 wrote:
>
> From: Andrew Pinski 
>
> To improve phiopt and be able to remove abs_replacement, this ports
> most of "A CMP 0 ? A : -A" from fold_cond_expr_with_comparison to
> match.pd.  There is a few extra changes that are needed to remove
> the "A CMP 0 ? A : -A" part from fold_cond_expr_with_comparison:
>* Need to handle (A - B) case
>* Need to handle UN* comparisons.
>
> I will handle those in a different patch.
>
> Note phi-opt-15.c test needed to be updated as we get ABSU now
> instead of not getting ABS.  When ABSU was added phiopt was not
> updated even to use ABSU instead of not creating ABS.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> gcc/ChangeLog:
>
> * match.pd (A CMP 0 ? A : -A): New patterns.
> * tree-ssa-phiopt.c (abs_replacement): Delete function.
> (tree_ssa_phiopt_worker): Don't call abs_replacement.
> Update comment about abs_replacement.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/phi-opt-15.c: Update test to expect
> ABSU and still not expect ABS_EXPR.
> ---
>  gcc/match.pd   |  60 +
>  gcc/testsuite/gcc.dg/tree-ssa/phi-opt-15.c |   4 +-
>  gcc/tree-ssa-phiopt.c  | 134 +
>  3 files changed, 64 insertions(+), 134 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 39fb57ee1f4..0c790dfa741 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3976,6 +3976,66 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(cnd (logical_inverted_value truth_valued_p@0) @1 @2)
>(cnd @0 @2 @1)))
>
> +/* abs/negative simplifications moved from fold_cond_expr_with_comparison,
> +   Need to handle (A - B) case as fold_cond_expr_with_comparison does.
> +   Need to handle UN* comparisons.
> +
> +   None of these transformations work for modes with signed
> +   zeros.  If A is +/-0, the first two transformations will
> +   change the sign of the result (from +0 to -0, or vice
> +   versa).  The last four will fix the sign of the result,
> +   even though the original expressions could be positive or
> +   negative, depending on the sign of A.
> +
> +   Note that all these transformations are correct if A is
> +   NaN, since the two alternatives (A and -A) are also NaNs.  */
> +
> +(for cnd (cond vec_cond)
> + /* A == 0? A : -Asame as -A */
> + (for cmp (eq uneq)
> +  (simplify
> +   (cnd (cmp @0 zerop) @0 (negate@1 @0))
> +(if (!HONOR_SIGNED_ZEROS (element_mode (type)))

I think you can drop element_mode () calls, the HONOR_* stuff
should work on compound types as well.

> + @1))
> +  (simplify
> +   (cnd (cmp @0 zerop) zerop (negate@1 @0))

So why do we need this special case?  zerop matches both
-0. and 0. but with constants and !HONOR_SIGNED_ZEROS
operand_equal_p as used by match should make that equal
to a mismatching sign zero in the (negate..) arm as well?  And
of course the negate should have been constant folded then.

Same for the other cases below, otherwise looks OK to me.

Thanks,
Richard.

> +(if (!HONOR_SIGNED_ZEROS (element_mode (type)))
> + @1))
> + )
> + /* A != 0? A : -Asame as A */
> + (for cmp (ne ltgt)
> +  (simplify
> +   (cnd (cmp @0 zerop) @0 (negate @0))
> +(if (!HONOR_SIGNED_ZEROS (element_mode (type)))
> + @0))
> +  (simplify
> +   (cnd (cmp @0 zerop) @0 zerop)
> +(if (!HONOR_SIGNED_ZEROS (element_mode (type)))
> + @0))
> + )
> + /* A >=/> 0? A : -Asame as abs (A) */
> + (for cmp (ge gt)
> +  (simplify
> +   (cnd (cmp @0 zerop) @0 (negate @0))
> +(if (!HONOR_SIGNED_ZEROS (element_mode (type))
> +&& !TYPE_UNSIGNED (type))
> + (abs @0
> + /* A <=/< 0? A : -Asame as -abs (A) */
> + (for cmp (le lt)
> +  (simplify
> +   (cnd (cmp @0 zerop) @0 (negate @0))
> +(if (!HONOR_SIGNED_ZEROS (element_mode (type))
> +&& !TYPE_UNSIGNED (type))
> + (if (ANY_INTEGRAL_TYPE_P (type)
> + && !TYPE_OVERFLOW_WRAPS (type))
> +  (with {
> +   tree utype = unsigned_type_for (type);
> +   }
> +   (convert (negate (absu:utype @0
> +   (negate (abs @0)
> + )
> +)
> +
>  /* -(type)!A -> (type)A - 1.  */
>  (simplify
>   (negate (convert?:s (logical_inverted_value:s @0)))
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-15.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-15.c
> index ac3018ef533..6aec68961cf 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-15.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-15.c
> @@ -9,4 +9,6 @@ foo (int i)
>return i;
>  }
>
> -/* { dg-final { scan-tree-dump-not "ABS" "optimized" } } */
> +/* We should not have ABS_EXPR but ABSU_EXPR instead. */
> +/* { dg-final { scan-tree-dump-not "ABS_EXPR" "optimized" } } */
> +/* { dg-final { scan-tree-dump "ABSU" "optimized" } } */
> diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
> index 9bda1b2a397..97540e30d55 100644
> --- a/gcc/tree-ssa-phiopt.c
> +++ b/gcc/tre

[PATCH] v850: silent 2 warnings

2021-06-28 Thread Martin Liška

Hello.

Tested and approved by Jeff.

Martin

Silents:

/home/marxin/Programming/gcc/gcc/config/v850/v850.c: In function ‘char* 
construct_dispose_instruction(rtx)’:
/home/marxin/Programming/gcc/gcc/config/v850/v850.c:2690:22: warning: ‘%s’ 
directive writing up to 99 bytes into a region of size between 79 and 89 
[-Wformat-overflow=]
 2690 |   sprintf (buff, "dispose %d {%s}, r31", stack_bytes / 4, regs);
  |  ^~   
/home/marxin/Programming/gcc/gcc/config/v850/v850.c:2690:15: note: ‘sprintf’ 
output between 18 and 127 bytes into a destination of size 100
 2690 |   sprintf (buff, "dispose %d {%s}, r31", stack_bytes / 4, regs);
  |   ^
/home/marxin/Programming/gcc/gcc/config/v850/v850.c: In function ‘char* 
construct_prepare_instruction(rtx)’:
/home/marxin/Programming/gcc/gcc/config/v850/v850.c:2814:22: warning: ‘%s’ 
directive writing up to 99 bytes into a region of size 91 [-Wformat-overflow=]
 2814 |   sprintf (buff, "prepare {%s}, %d", regs, (- stack_bytes) / 4);
  |  ^~  
/home/marxin/Programming/gcc/gcc/config/v850/v850.c:2814:15: note: ‘sprintf’ 
output between 14 and 123 bytes into a destination of size 100
 2814 |   sprintf (buff, "prepare {%s}, %d", regs, (- stack_bytes) / 4);
  |   ^

gcc/ChangeLog:

* config/v850/v850.c (construct_dispose_instruction): Allocate
a bigger buffer.
(construct_prepare_instruction): Likewise.
---
 gcc/config/v850/v850.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/v850/v850.c b/gcc/config/v850/v850.c
index 371e6026e9a..4978faf9318 100644
--- a/gcc/config/v850/v850.c
+++ b/gcc/config/v850/v850.c
@@ -2583,7 +2583,7 @@ construct_dispose_instruction (rtx op)
   intstack_bytes;
   unsigned long int  mask;
   int   i;
-  static charbuff[ 100 ]; /* XXX */
+  static charbuff[ 120 ]; /* XXX */
   intuse_callt = 0;
   
   if (count <= 2)

@@ -2704,7 +2704,7 @@ construct_prepare_instruction (rtx op)
   intstack_bytes;
   unsigned long int  mask;
   int   i;
-  static charbuff[ 100 ]; /* XXX */
+  static charbuff[ 120 ]; /* XXX */
   int   use_callt = 0;
   
   if (XVECLEN (op, 0) <= 1)

--
2.32.0



[PATCH] v850: add v850_can_inline_p target hook

2021-06-28 Thread Martin Liška

Tested and approved by Jeff.

I'm going to push it.

Martin

gcc/ChangeLog:

* config/v850/v850.c (v850_option_override): Build default
target node.
(v850_can_inline_p): New.  Allow MASK_PROLOG_FUNCTION to be
ignored for inlining.
(TARGET_CAN_INLINE_P): New.
---
 gcc/config/v850/v850.c | 32 
 1 file changed, 32 insertions(+)

diff --git a/gcc/config/v850/v850.c b/gcc/config/v850/v850.c
index e0e5005d865..371e6026e9a 100644
--- a/gcc/config/v850/v850.c
+++ b/gcc/config/v850/v850.c
@@ -3140,6 +3140,11 @@ v850_option_override (void)
   /* The RH850 ABI does not (currently) support the use of the CALLT 
instruction.  */
   if (! TARGET_GCC_ABI)
 target_flags |= MASK_DISABLE_CALLT;
+
+  /* Save the initial options in case the user does function specific
+ options.  */
+  target_option_default_node = target_option_current_node
+= build_target_option_node (&global_options, &global_options_set);
 }
 
 const char *
@@ -3192,6 +3197,29 @@ v850_modes_tieable_p (machine_mode mode1, machine_mode 
mode2)
   return (mode1 == mode2
  || (GET_MODE_SIZE (mode1) <= 4 && GET_MODE_SIZE (mode2) <= 4));
 }
+
+static bool
+v850_can_inline_p (tree caller, tree callee)
+{
+  tree caller_tree = DECL_FUNCTION_SPECIFIC_TARGET (caller);
+  tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee);
+
+  const unsigned HOST_WIDE_INT safe_flags = MASK_PROLOG_FUNCTION;
+
+  if (!callee_tree)
+callee_tree = target_option_default_node;
+  if (!caller_tree)
+caller_tree = target_option_default_node;
+  if (callee_tree == caller_tree)
+return true;
+
+  cl_target_option *caller_opts = TREE_TARGET_OPTION (caller_tree);
+  cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree);
+
+  return ((caller_opts->x_target_flags & ~safe_flags)
+ == (callee_opts->x_target_flags & ~safe_flags));
+}
+
 
 /* Initialize the GCC target structure.  */
 
@@ -3306,6 +3334,10 @@ v850_modes_tieable_p (machine_mode mode1, machine_mode mode2)

 #undef  TARGET_HAVE_SPECULATION_SAFE_VALUE
 #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
 
+#undef TARGET_CAN_INLINE_P

+#define TARGET_CAN_INLINE_P v850_can_inline_p
+
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-v850.h"

--
2.32.0



Re: [wwwdocs] gcc-12/changes.html: OpenMP + GCN update

2021-06-28 Thread Tobias Burnus

On 23.06.21 11:58, Andrew Stubbs wrote:

On 23/06/2021 10:53, Tobias Burnus wrote:

+  additionally the following features which were available in C
and C++
+  before:  depobj, mutexinoutset and


I realise that you did not invent this awkward wording, but I'd prefer
...
"the following features that were previously only available in C and
C++: "


Committed as 1862b862ee18ad5ce6f7fe1f7c354c1ad95e58e8 +
https://gcc.gnu.org/gcc-12/changes.html

(I kept the awkward but more positively sounding marketing speech
instead of using the clearer, non-awkward but more negative sounding
version.)

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf


[PATCH] tree-optimization/101229 - fix vectorizer SLP hybrid detection with PHIs

2021-06-28 Thread Richard Biener
This fixes the missing handling of PHIs in gimple_walk_op which causes
the new vectorizer SLP hybrid detection scheme to fail.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed to trunk.

2021-06-28  Richard Biener  

PR tree-optimization/101229
* gimple-walk.c (gimple_walk_op): Handle PHIs.

* gcc.dg/torture/pr101229.c: New testcase.
---
 gcc/gimple-walk.c   | 24 
 gcc/testsuite/gcc.dg/torture/pr101229.c | 19 +++
 2 files changed, 43 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr101229.c

diff --git a/gcc/gimple-walk.c b/gcc/gimple-walk.c
index e4a55f1eeb6..18884c449a0 100644
--- a/gcc/gimple-walk.c
+++ b/gcc/gimple-walk.c
@@ -517,6 +517,30 @@ walk_gimple_op (gimple *stmt, walk_tree_fn callback_op,
 case GIMPLE_PREDICT:
   break;
 
+case GIMPLE_PHI:
+  /* PHIs are not GSS_WITH_OPS so we need to handle them explicitely.  */
+  {
+   gphi *phi = as_a  (stmt);
+   if (wi)
+ {
+   wi->val_only = true;
+   wi->is_lhs = true;
+ }
+   ret = walk_tree (gimple_phi_result_ptr (phi), callback_op, wi, pset);
+   if (wi)
+ wi->is_lhs = false;
+   if (ret)
+ return ret;
+   for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+ {
+   ret = walk_tree (gimple_phi_arg_def_ptr (phi, i),
+callback_op, wi, pset);
+   if (ret)
+ return ret;
+ }
+   break;
+  }
+
 default:
   {
enum gimple_statement_structure_enum gss;
diff --git a/gcc/testsuite/gcc.dg/torture/pr101229.c 
b/gcc/testsuite/gcc.dg/torture/pr101229.c
new file mode 100644
index 000..37080313727
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr101229.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+
+int a[1024];
+void foo()
+{
+  for (int i; i; i += 4) {
+int suma = a[i];
+int sumb = a[i + 1];
+int sumc;
+for (unsigned j = 0; j < 77; ++j) {
+  suma = (suma ^ i) + 1;
+  sumb = (sumb ^ i) + 2;
+  sumc = suma ^ i;
+}
+a[i] = suma;
+a[i + 1] = sumb;
+a[i + 2] = sumc;
+  }
+}
-- 
2.26.2


Re: GCC documentation: porting to Sphinx

2021-06-28 Thread Arnaud Charlet
> I've got something that is very close to be a patch candidate that can be
> eventually merged. Right now, the patches are available here:
> https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=log;h=refs/users/marxin/heads/sphinx-v3

FWIW I would prefer to review the changes posted here directly with all
the details.

In particular can you explain the motivation behind all the changes in the
gcc/ada/doc directory? That's lots of moving files around, so I'd like
to understand why and make sure these are not gratuituous changes, since
move files around is always tricky and I'd rather not have to undo it
later in case this causes troubles or have unexpected consequences.

Otherwise, glad to see the switch to sphinx finally moving in gcc!

Arno


Re: [EXTERNAL] Re: rs6000: Fix typos in float128 ISA3.1 support

2021-06-28 Thread Segher Boessenkool
On Mon, Jun 28, 2021 at 04:15:15PM +0800, Kewen.Lin wrote:
> on 2021/6/25 上午3:36, Segher Boessenkool wrote:
> > mode(__TI__) is just the more portable way of writing mode(TI), the
> > latter will not work if something #define's TI (you cannot do that with
> > __TI__, you are not allowed to by the C standard, in application code).
> 
> Yeah, thanks for the note.  It looks better to update the generic
> macro with this ppc style "__" writting and remove ppc one. :-)
> 
> One related bug PR101235 was just opened, I noticed the culprit commit
> was backported to GCC11, is it OK to backport this fix to GCC 11 if
> everything goes well in one more week?

Please backport this immediately.  Thanks!


Segher


Re: GCC documentation: porting to Sphinx

2021-06-28 Thread Martin Liška

On 6/28/21 12:23 PM, Arnaud Charlet wrote:

I've got something that is very close to be a patch candidate that can be
eventually merged. Right now, the patches are available here:
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=log;h=refs/users/marxin/heads/sphinx-v3


FWIW I would prefer to review the changes posted here directly with all
the details.


Sure, I'm going to send a proper patch set in an hour or so. As mentioned, I 
won't be able to attach
some of the patches as they will exceed 1MB email limit.



In particular can you explain the motivation behind all the changes in the
gcc/ada/doc directory?


Sure:
1) All Sphinx manuals live in a directory where index page is called index.rst. 
That's why
I moved e.g. this: gcc/ada/doc/{gnat_rm.rst => gnat_rm/index.rst}
2) I moved latex_elements.py to ada_latex_elements.py as it clashes with Sphinx 
global variable
you modify in Sphinx config files
3) I created a shared Ada config (adabaseconf.py) that extends doc/baseconf.py 
and sets what
is shared in between 3 Ada manuals.
4) gnu_free_documentation_license.rst is taken from $root/doc/


That's lots of moving files around, so I'd like
to understand why and make sure these are not gratuituous changes, since
move files around is always tricky and I'd rather not have to undo it
later in case this causes troubles or have unexpected consequences.


Hope I explained all the reasonable changes?



Otherwise, glad to see the switch to sphinx finally moving in gcc!


You're welcome. I would be interested in testing your PRO configuration (based 
on Gnat_Build_Type,
see get_gnat_build_type) and I'm curious if you're fine with Sphinx template 
change?
It will be the same as for other manuals.

Cheers,
Martin



Arno





Re: [Patch] Add 'default' to -foffload=; document that flag [PR67300]

2021-06-28 Thread Tobias Burnus

Hi Sandra, hi all,

On 19.06.21 00:47, Sandra Loosemore wrote:

Thanks. The description of the options is a lot easier to follow now,
so I mostly have only nit-picky Texinfo/grammar/terminology comments
about the docs now.

Thanks for your comments/wording suggestions.

The -f options are alphabetized in most of the other @gccoptlist
tables in the option summary section.  I'm not sure why this group
isn't, but you get extra credit if you fix that, too.


Done so. While doing so and then also sorting the list below, I noticed:
* optlist still had -fallow-single-precision but the entry was removed
  in commit f458d1d5d7bd85e412689858ea5d5de681608fbb
* there is -fgnu-tm - but it was missing from the optlist
* I did not fully sort -fsigned-bitfields  -funsigned-bitfields as
  those are in a single entry; hence, I also kept
  -fsigned-char  -funsigned-char together. That also helps when
  reading as they belong together.


+@smallexample
+-foffload=-lgfortran -foffload=-lm


I did notice that this should now be -foffload-option= ... – now fixed.

Except for the doc/invoke.texi changes unchanged compared to previous
version.

OK?

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
Add 'default' to -foffload=; document that flag [PR67300]

As -foffload={options,targets,targets=options} is very convoluted,
it has been split into -foffload=targets (supporting the old syntax
for backward compatibilty) and -foffload-options={options,target=options}.

Only the new syntax is documented.

Additionally, -foffload=default is supported, which can reset the
devices after -foffload=disable / -foffload=targets to the default,
if needed.

gcc/ChangeLog:

* common.opt (-foffload=): Update description.
	(-foffload-options=): New.
* doc/invoke.texi (C Language Options): Sort options
	alphabetical in optlist and also the description itself.
	(-foffload, -foffload-options): New.
* gcc.c (check_offload_target_name): New, split off from
	handle_foffload_option.
(check_foffload_target_names): New.
(handle_foffload_option): Handle -foffload=default.
(driver_handle_option): Update for -foffload-options.
* lto-opts.c (lto_write_options): Use -foffload-options
	instead of -foffload.
* lto-wrapper.c (merge_and_complain, append_offload_options):
	Likewise.
* opts.c (common_handle_option): Likewise.

diff --git a/gcc/common.opt b/gcc/common.opt
index a1353e06bdc..a695a8c5964 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2095,9 +2095,15 @@ fnon-call-exceptions
 Common Var(flag_non_call_exceptions) Optimization
 Support synchronous non-call exceptions.
 
+; -foffload= is documented
+; -foffload== is supported for backward compatibility
 foffload=
-Common Driver Joined MissingArgError(options or targets missing after %qs)
--foffload==	Specify offloading targets and options for them.
+Driver Joined MissingArgError(targets missing after %qs)
+-foffload=	Specify offloading targets
+
+foffload-options=
+Common Driver Joined MissingArgError(options or targets=options missing after %qs)
+-foffload==	Specify options for the offloading targets
 
 foffload-abi=
 Common Joined RejectNegative Enum(offload_abi) Var(flag_offload_abi) Init(OFFLOAD_ABI_UNSET)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index af2ce189fae..f8e41d41801 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -197,17 +197,17 @@ in the following sections.
 
 @item C Language Options
 @xref{C Dialect Options,,Options Controlling C Dialect}.
-@gccoptlist{-ansi  -std=@var{standard}  -fgnu89-inline @gol
--fpermitted-flt-eval-methods=@var{standard} @gol
--aux-info @var{filename}  -fallow-parameterless-variadic-functions @gol
--fno-asm  -fno-builtin  -fno-builtin-@var{function}  -fgimple@gol
--fhosted  -ffreestanding @gol
+@gccoptlist{-ansi  -std=@var{standard}  -aux-info @var{filename} @gol
+-fallow-parameterless-variadic-functions  -fno-asm  @gol
+-fno-builtin  -fno-builtin-@var{function}  -fcond-mismatch @gol
+-ffreestanding  -fgimple  -fgnu-tm  -fgnu89-inline  -fhosted @gol
+-flax-vector-conversions  -fms-extensions @gol
 -fopenacc  -fopenacc-dim=@var{geom} @gol
+-foffload=@var{arg} -foffload-options=@var{arg} @gol
 -fopenmp  -fopenmp-simd @gol
--fms-extensions  -fplan9-extensions  -fsso-struct=@var{endianness} @gol
--fallow-single-precision  -fcond-mismatch  -flax-vector-conversions @gol
--fsigned-bitfields  -fsigned-char @gol
--funsigned-bitfields  -funsigned-char}
+-fpermitted-flt-eval-methods=@var{standard} @gol
+-fplan9-extensions -fsigned-bitfields -funsigned-bitfields @gol
+-fsigned-char -funsigned-char -fsso-struct=@var{endianness}}
 
 @item C++ Language Options
 @xref{C++ Dialect Options,,Options Controlling C++ Dialect}.
@@ -2448,50 +2448,6 @@ and will almost certainly change in incompatible ways in future
 releases.
 @end table
 
-@item -fgnu89-inline

[PATCH][pushed] mklog: Handle correctly long lines.

2021-06-28 Thread Martin Liška

Long lines need special handling.

Martin

contrib/ChangeLog:

* mklog.py: Handle correctly long lines.
* test_mklog.py: Test it.
---
 contrib/mklog.py  | 22 ++
 contrib/test_mklog.py | 25 +
 2 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/contrib/mklog.py b/contrib/mklog.py
index 674c1dcd78b..ba70af0eef2 100755
--- a/contrib/mklog.py
+++ b/contrib/mklog.py
@@ -38,6 +38,9 @@ import requests
 
 from unidiff import PatchSet
 
+LINE_LIMIT = 100

+TAB_WIDTH = 8
+
 pr_regex = re.compile(r'(\/(\/|\*)|[Cc*!])\s+(?PPR [a-z+-]+\/[0-9]+)')
 prnum_regex = re.compile(r'PR (?P[a-z+-]+)/(?P[0-9]+)')
 dr_regex = re.compile(r'(\/(\/|\*)|[Cc*!])\s+(?PDR [0-9]+)')
@@ -134,6 +137,16 @@ def get_pr_titles(prs):
 return '\n'.join(output)
 
 
+def append_changelog_line(out, relative_path, text):

+line = f'\t* {relative_path}:'
+if len(line.replace('\t', ' ' * TAB_WIDTH) + ' ' + text) <= LINE_LIMIT:
+out += f'{line} {text}\n'
+else:
+out += f'{line}\n'
+out += f'\t{text}\n'
+return out
+
+
 def generate_changelog(data, no_functions=False, fill_pr_titles=False,
additional_prs=None):
 changelogs = {}
@@ -213,12 +226,12 @@ def generate_changelog(data, no_functions=False, 
fill_pr_titles=False,
 relative_path = file.path[len(changelog):].lstrip('/')
 functions = []
 if file.is_added_file:
-msg = 'New test' if in_tests else 'New file'
-out += '\t* %s: %s.\n' % (relative_path, msg)
+msg = 'New test.' if in_tests else 'New file.'
+out = append_changelog_line(out, relative_path, msg)
 elif file.is_removed_file:
-out += '\t* %s: Removed.\n' % (relative_path)
+out = append_changelog_line(out, relative_path, 'Removed.')
 elif hasattr(file, 'is_rename') and file.is_rename:
-out += '\t* %s: Moved to...\n' % (relative_path)
+out = append_changelog_line(out, relative_path, 'Moved to...')
 new_path = file.target_file[2:]
 # A file can be theoretically moved to a location that
 # belongs to a different ChangeLog.  Let user fix it.
@@ -227,6 +240,7 @@ def generate_changelog(data, no_functions=False, 
fill_pr_titles=False,
 out += '\t* %s: ...here.\n' % (new_path)
 elif os.path.basename(file.path) in generated_files:
 out += '\t* %s: Regenerate.\n' % (relative_path)
+append_changelog_line(out, relative_path, 'Regenerate.')
 else:
 if not no_functions:
 for hunk in file:
diff --git a/contrib/test_mklog.py b/contrib/test_mklog.py
index f5e9ecd577c..bf2f280b46e 100755
--- a/contrib/test_mklog.py
+++ b/contrib/test_mklog.py
@@ -443,6 +443,27 @@ gcc/ChangeLog:
 
 '''
 
+PATCH10 = '''\

+diff --git 
a/libgomp/doc/the-libgomp-abi/implementing-firstprivate-lastprivate-copyin-and-copyprivate-clauses.rst
 
b/libgomp/doc/the-libgomp-abi/implementing-firstprivate-lastprivate-copyin-and-copyprivate-clauses.rst
+new file mode 100644
+index 000..ad3c6d856fc
+--- /dev/null
 
b/libgomp/doc/the-libgomp-abi/implementing-firstprivate-lastprivate-copyin-and-copyprivate-clauses.rst
+@@ -0,0 +1,3 @@
++
++
++
+
+'''
+
+EXPECTED10 = '''\
+libgomp/ChangeLog:
+
+   * 
doc/the-libgomp-abi/implementing-firstprivate-lastprivate-copyin-and-copyprivate-clauses.rst:
+   New file.
+
+'''
+
 class TestMklog(unittest.TestCase):
 def test_macro_definition(self):
 changelog = generate_changelog(PATCH1)
@@ -485,3 +506,7 @@ class TestMklog(unittest.TestCase):
 def test_define_macro_parsing(self):
 changelog = generate_changelog(PATCH9)
 assert changelog == EXPECTED9
+
+def test_long_filenames(self):
+changelog = generate_changelog(PATCH10)
+assert changelog == EXPECTED10
--
2.32.0



Re: [committed] libstdc++: More workarounds in 17_intro/names.cc test [PR 97088]

2021-06-28 Thread Christophe LYON via Gcc-patches



On 25/06/2021 21:51, Jonathan Wakely via Libstdc++ wrote:

Conditionally #undef some more names that are used in system headers.

libstdc++-v3/ChangeLog:

PR libstdc++/97088
* testsuite/17_intro/names.cc: Undef more names for newlib and
also for arm-none-linux-gnueabi.
* testsuite/experimental/names.cc: Disable PCH.

Tested powerpc64le-linux. Committed to trunk.


Hi Jonathan,

After disabling PCH, we now have the following failures on arm-eabi, 
using newlib-3.3:


FAIL: experimental/names.cc (test for excess errors)
Excess errors:
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/math.h:194: 
error: expected ')' before ';' token
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/math.h:195: 
error: expected ')' before ';' token
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/math.h:196: 
error: expected ')' before ';' token
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/math.h:197: 
error: expected ')' before ';' token
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/math.h:198: 
error: expected ')' before ';' token
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/math.h:199: 
error: expected ')' before ';' token
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/math.h:200: 
error: expected ')' before ';' token
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/math.h:201: 
error: expected ')' before ';' token
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/time.h:110: 
error: expected unqualified-id before ';' token
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/time.h:110: 
error: expected ')' before ';' token
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/inttypes.h:323: 
error: expected ')' before ';' token


All this is a bit of a mess :-)

Christophe




[PATCH] Port GCC documentation to Sphinx

2021-06-28 Thread Martin Liška

Hello.

I'm sending the complete patch set that includes ChangeLog entries. 
Unfortunately,
majority of the patches are huge, that's why I sent like to a tarball:
https://splichal.eu/tmp/port-to-sphinx-v1.tar

The tarball contains the following patches:

19e06194746 Ada: port to Sphinx.
9a744ca431d Remove unused TEX files.
e624967b5e8 Port jit to new Sphinx layout.
8c4717b262a Build system: support Sphinx
d102880437e Add include directives for target macros.
08c3d3f0d8d Add RST files with config files.

Thanks,
Martin


Re: [PATCH] Generalize -fuse-ld= to support absolute path or arbitrary ld.linker

2021-06-28 Thread Martin Liška

On 6/26/21 4:44 PM, Artur Sinila wrote:

Not so gentle ping :)
What should happen in order for this patch to be accepted?



Hello.

We came up to conclusion that one can use the currently supported option
-fuse-ld={bfd,gold,lld} with -B that can point to an arbitrary path
the such linker is expected.

Cheers,
Martin


Re: [PATCH v5 2/2] x86: Add vec_duplicate expander

2021-06-28 Thread H.J. Lu via Gcc-patches
On Sun, Jun 27, 2021 at 2:00 PM Richard Sandiford
 wrote:
>
> "H.J. Lu via Gcc-patches"  writes:
> > On Sun, Jun 27, 2021 at 1:43 AM Richard Sandiford
> >  wrote:
> >>
> >> "H.J. Lu"  writes:
> >> > 1. Update vec_duplicate to allow to fail so that backend can only allow
> >> > broadcasting an integer constant to a vector when broadcast instruction
> >> > is available.  This can be used by memset expander to avoid vec_duplicate
> >> > when loading from constant pool is more efficient.
> >>
> >> I don't see any changes in target-independent code though, other than
> >> the doc update.  It's still the case that (existing) uses of
> >> vec_duplicate_optab do not allow it to fail.
> >
> > I have a followup patch set on
> >
> > https://gitlab.com/x86-gcc/gcc/-/commits/users/hjl/pieces/broadcast
> >
> > to use it to expand memset with vector broadcast:
> >
> > https://gitlab.com/x86-gcc/gcc/-/commit/991c87f8a83ca736ae9ed92baa3ebadca289f6e3
> >
> > For SSE2 which doesn't have vector broadcast, the constant vector broadcast
> > expander returns FAIL and load from constant pool will be used.
>
> Hmm, but as Jeff and I mentioned in the earlier replies,
> vec_duplicate_optab shouldn't be used for constants.  Constants
> should go via the move expanders instead.
>
> In a previous message I suggested:
>
>   … would it work to change:
>
> /* Try using vec_duplicate_optab for uniform vectors.  */
> if (!TREE_SIDE_EFFECTS (exp)
> && VECTOR_MODE_P (mode)
> && eltmode == GET_MODE_INNER (mode)
> && ((icode = optab_handler (vec_duplicate_optab, mode))
> != CODE_FOR_nothing)
> && (elt = uniform_vector_p (exp)))
>
>   to something like:
>
> /* Try using vec_duplicate_optab for uniform vectors.  */
> if (!TREE_SIDE_EFFECTS (exp)
> && VECTOR_MODE_P (mode)
> && eltmode == GET_MODE_INNER (mode)
> && (elt = uniform_vector_p (exp)))
>   {
> if (TREE_CODE (elt) == INTEGER_CST
> || TREE_CODE (elt) == POLY_INT_CST
> || TREE_CODE (elt) == REAL_CST
> || TREE_CODE (elt) == FIXED_CST)
>   {
> rtx src = gen_const_vec_duplicate (mode, expand_normal 
> (node));
> emit_move_insn (target, src);
> break;
>   }
> …
>   }
>
> if that code was the source of the constant operand.  If we're adding a
> new use of vec_duplicate_optab then that should be similarly protected
> against constant operands.
>

Your comments apply to my initial vec_duplicate patch that caused the
gcc.dg/pr100239.c failure.  It has been fixed by

commit ffe3a37f54ab866d85bdde48c2a32be5e09d8515
Author: Richard Biener 
Date:   Mon Jun 7 20:08:13 2021 +0200

middle-end/100951 - make sure to generate VECTOR_CST in lowering

When vector lowering creates piecewise ops make sure to create
VECTOR_CSTs instead of CONSTRUCTORs when possible.

The problem I am running into now is in my memset vector broadcast
patch.  In order to optimize vector broadcast for memset, I need to
generate a pseudo register for

 __builtin_memset (ops, 3, 38);

only when vector broadcast is available:

  rtx target = nullptr;

  unsigned int nunits = GET_MODE_SIZE (mode) / GET_MODE_SIZE (QImode);
  machine_mode vector_mode;
  if (!mode_for_vector (QImode, nunits).exists (&vector_mode))
gcc_unreachable ();

  enum insn_code icode = optab_handler (vec_duplicate_optab,
vector_mode);
  if (icode != CODE_FOR_nothing)
{
  rtx reg = targetm.gen_memset_scratch_rtx (vector_mode);
  class expand_operand ops[2];
  create_output_operand (&ops[0], reg, vector_mode);
  create_input_operand (&ops[1], data, QImode);
  if (maybe_expand_insn (icode, 2, ops))
{
  if (!rtx_equal_p (reg, ops[0].value))
emit_move_insn (reg, ops[0].value);
  target = lowpart_subreg (mode, reg, vector_mode);
}
}

  return target;  <<< Return nullptr to load from constant pool.

-- 
H.J.


Re: [PATCH] rs6000: Fix restored rs6000_long_double_type_size.

2021-06-28 Thread Martin Liška

On 6/24/21 12:46 AM, Segher Boessenkool wrote:

Hi!

On Wed, Jun 23, 2021 at 03:22:34PM +0200, Martin Liška wrote:

As mentioned in the "Fallout: save/restore target options in
handle_optimize_attribute"
thread, we need to support target option restore of
rs6000_long_double_type_size == FLOAT_PRECISION_TFmode.


I have no idea?  Could you explain please?


Sure. Few weeks ago, we started using cl_target_option_{save,restore} calls
even for optimize attributes (and pragma). Motivation was that optimize options
can influence target options (and vice versa).

Doing that, FLOAT_PRECISION_TFmode must be accepted as a valid option value
for rs6000_long_double_type_size.




--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4185,6 +4185,8 @@ rs6000_option_override_internal (bool global_init_p)
else
rs6000_long_double_type_size = default_long_double_size;
  }
+  else if (rs6000_long_double_type_size == FLOAT_PRECISION_TFmode)
+; /* The option can be restored a TREE_TARGET_OPTION.  */


What does that mean?  It is not grammatical, and not obvious what it
should mean.


Updated.




--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pragma-optimize.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target { powerpc*-*-linux* } } } */


Why on Linux only?  That doesn't sound right.  Do you need some other
selector(s)?


Sorry, I copied the test-case.




+/* { dg-options "-O2 -mlong-double-128 -mabi=ibmlongdouble" } */
+
+extern unsigned long int x;
+extern float f (float);
+extern __typeof (f) f_power8;
+extern __typeof (f) f_power9;
+extern __typeof (f) f __attribute__ ((ifunc ("f_ifunc")));
+static __attribute__ ((optimize ("-fno-stack-protector"))) __typeof (f) *


-fno-stack-protector is default.


Yes, but one needs an optimize attribute in order to trigger 
cl_target_option_save/restore
mechanism.

Martin




+f_ifunc (void)
+{
+  __typeof (f) *res = x ? f_power9 : f_power8;
+  return res;
+}


The testcase should say what it is testing for, it is not obvious?


Segher



>From 1632939853fbf193f72ace3d1024a137d549fef4 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 1 Jun 2021 15:39:14 +0200
Subject: [PATCH] rs6000: Fix restored rs6000_long_double_type_size.

gcc/ChangeLog:

	* config/rs6000/rs6000.c (rs6000_option_override_internal): When
	a target option is restored, it can have
	rs6000_long_double_type_size set to FLOAT_PRECISION_TFmode.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/pragma-optimize.c: New test.
---
 gcc/config/rs6000/rs6000.c |  2 ++
 gcc/testsuite/gcc.target/powerpc/pragma-optimize.c | 14 ++
 2 files changed, 16 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pragma-optimize.c

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 2c249e186e1..fa4aa864c00 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4185,6 +4185,8 @@ rs6000_option_override_internal (bool global_init_p)
   else
 	rs6000_long_double_type_size = default_long_double_size;
 }
+  else if (rs6000_long_double_type_size == FLOAT_PRECISION_TFmode)
+; /* The option can be restored with cl_target_option_restore.  */
   else if (rs6000_long_double_type_size == 128)
 rs6000_long_double_type_size = FLOAT_PRECISION_TFmode;
   else if (global_options_set.x_rs6000_ieeequad)
diff --git a/gcc/testsuite/gcc.target/powerpc/pragma-optimize.c b/gcc/testsuite/gcc.target/powerpc/pragma-optimize.c
new file mode 100644
index 000..2455fb57138
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pragma-optimize.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-options "-O2 -mlong-double-128 -mabi=ibmlongdouble" } */
+
+extern unsigned long int x;
+extern float f (float);
+extern __typeof (f) f_power8;
+extern __typeof (f) f_power9;
+extern __typeof (f) f __attribute__ ((ifunc ("f_ifunc")));
+static __attribute__ ((optimize ("-fno-stack-protector"))) __typeof (f) *
+f_ifunc (void)
+{
+  __typeof (f) *res = x ? f_power9 : f_power8;
+  return res;
+}
-- 
2.32.0



Re: [PATCH] Generalize -fuse-ld= to support absolute path or arbitrary ld.linker

2021-06-28 Thread Artur Sinila via Gcc-patches
On Mon, 2021-06-28 at 14:08 +0200, Martin Liška wrote:
> On 6/26/21 4:44 PM, Artur Sinila wrote:
> > Not so gentle ping :)
> > What should happen in order for this patch to be accepted?
> > 
> 
> Hello.
> 
> We came up to conclusion that one can use the currently supported
> option
> -fuse-ld={bfd,gold,lld} with -B that can point to an arbitrary path
> the such linker is expected.
> 
> Cheers,
> Martin

Hello.

Thank you for the reply. I'd like to use mold linker with gcc:
https://github.com/rui314/mold. So your solution doesn't help.

There are 2 options:
1. Add mold to -fuse-ld option
2. Implement --ld-path

The 2nd option is much more future-proof: you won't need to add new
-fuse-ld variant each time new linker comes up. To provide some
context: clang had been supporting passing path to -fuse-ld, but since
clang 12 this is deprecated in favor of new --ld-path option. I think
we should take an example from clang and implement this useful feature
in gcc as well.

Cheers,
Artur Sinila


signature.asc
Description: This is a digitally signed message part


Re: [PATCH v5 2/2] x86: Add vec_duplicate expander

2021-06-28 Thread Richard Sandiford via Gcc-patches
"H.J. Lu"  writes:
> On Sun, Jun 27, 2021 at 2:00 PM Richard Sandiford
>  wrote:
>>
>> "H.J. Lu via Gcc-patches"  writes:
>> > On Sun, Jun 27, 2021 at 1:43 AM Richard Sandiford
>> >  wrote:
>> >>
>> >> "H.J. Lu"  writes:
>> >> > 1. Update vec_duplicate to allow to fail so that backend can only allow
>> >> > broadcasting an integer constant to a vector when broadcast instruction
>> >> > is available.  This can be used by memset expander to avoid 
>> >> > vec_duplicate
>> >> > when loading from constant pool is more efficient.
>> >>
>> >> I don't see any changes in target-independent code though, other than
>> >> the doc update.  It's still the case that (existing) uses of
>> >> vec_duplicate_optab do not allow it to fail.
>> >
>> > I have a followup patch set on
>> >
>> > https://gitlab.com/x86-gcc/gcc/-/commits/users/hjl/pieces/broadcast
>> >
>> > to use it to expand memset with vector broadcast:
>> >
>> > https://gitlab.com/x86-gcc/gcc/-/commit/991c87f8a83ca736ae9ed92baa3ebadca289f6e3
>> >
>> > For SSE2 which doesn't have vector broadcast, the constant vector broadcast
>> > expander returns FAIL and load from constant pool will be used.
>>
>> Hmm, but as Jeff and I mentioned in the earlier replies,
>> vec_duplicate_optab shouldn't be used for constants.  Constants
>> should go via the move expanders instead.
>>
>> In a previous message I suggested:
>>
>>   … would it work to change:
>>
>> /* Try using vec_duplicate_optab for uniform vectors.  */
>> if (!TREE_SIDE_EFFECTS (exp)
>> && VECTOR_MODE_P (mode)
>> && eltmode == GET_MODE_INNER (mode)
>> && ((icode = optab_handler (vec_duplicate_optab, mode))
>> != CODE_FOR_nothing)
>> && (elt = uniform_vector_p (exp)))
>>
>>   to something like:
>>
>> /* Try using vec_duplicate_optab for uniform vectors.  */
>> if (!TREE_SIDE_EFFECTS (exp)
>> && VECTOR_MODE_P (mode)
>> && eltmode == GET_MODE_INNER (mode)
>> && (elt = uniform_vector_p (exp)))
>>   {
>> if (TREE_CODE (elt) == INTEGER_CST
>> || TREE_CODE (elt) == POLY_INT_CST
>> || TREE_CODE (elt) == REAL_CST
>> || TREE_CODE (elt) == FIXED_CST)
>>   {
>> rtx src = gen_const_vec_duplicate (mode, expand_normal 
>> (node));
>> emit_move_insn (target, src);
>> break;
>>   }
>> …
>>   }
>>
>> if that code was the source of the constant operand.  If we're adding a
>> new use of vec_duplicate_optab then that should be similarly protected
>> against constant operands.
>>
>
> Your comments apply to my initial vec_duplicate patch that caused the
> gcc.dg/pr100239.c failure.  It has been fixed by
>
> commit ffe3a37f54ab866d85bdde48c2a32be5e09d8515
> Author: Richard Biener 
> Date:   Mon Jun 7 20:08:13 2021 +0200
>
> middle-end/100951 - make sure to generate VECTOR_CST in lowering
>
> When vector lowering creates piecewise ops make sure to create
> VECTOR_CSTs instead of CONSTRUCTORs when possible.
>
> The problem I am running into now is in my memset vector broadcast
> patch.  In order to optimize vector broadcast for memset, I need to
> generate a pseudo register for
>
>  __builtin_memset (ops, 3, 38);
>
> only when vector broadcast is available:
>
>   rtx target = nullptr;
>
>   unsigned int nunits = GET_MODE_SIZE (mode) / GET_MODE_SIZE (QImode);
>   machine_mode vector_mode;
>   if (!mode_for_vector (QImode, nunits).exists (&vector_mode))
> gcc_unreachable ();
>
>   enum insn_code icode = optab_handler (vec_duplicate_optab,
> vector_mode);
>   if (icode != CODE_FOR_nothing)
> {
>   rtx reg = targetm.gen_memset_scratch_rtx (vector_mode);
>   class expand_operand ops[2];
>   create_output_operand (&ops[0], reg, vector_mode);
>   create_input_operand (&ops[1], data, QImode);
>   if (maybe_expand_insn (icode, 2, ops))
> {
>   if (!rtx_equal_p (reg, ops[0].value))
> emit_move_insn (reg, ops[0].value);
>   target = lowpart_subreg (mode, reg, vector_mode);
> }
> }
>
>   return target;  <<< Return nullptr to load from constant pool.

I don't think this is a correct use of vec_duplicate_optab.  If the
scalar operand is a constant then the move should always go through
the move expanders instead, as a move from a CONST_VECTOR.

Thanks,
Richard


Re: [PATCH] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2021-06-28 Thread Andrew MacLeod via Gcc-patches

On 6/27/21 11:46 AM, Aldy Hernandez wrote:



On 6/25/21 9:38 AM, Richard Biener wrote:
On Thu, Jun 24, 2021 at 5:01 PM Andrew MacLeod  
wrote:


On 6/24/21 9:25 AM, Andrew MacLeod wrote:

On 6/24/21 8:29 AM, Richard Biener wrote:


THe original function in EVRP currently looks like:

  === BB 2 
  :
 if (a_5(D) == b_6(D))
   goto ; [INV]
 else
   goto ; [INV]

=== BB 8 
Equivalence set : [a_5(D), b_6(D)] edge 2->8 provides
a_5 and b_6 as equivalences
  :
 goto ; [100.00%]

=== BB 6 
  :
 # i_1 = PHI <0(8), i_10(5)>
 if (i_1 < a_5(D))
   goto ; [INV]
 else
   goto ; [INV]

=== BB 3 
Relational : (i_1 < a_5(D)) edge 6->3 provides
this relation
  :
 if (i_1 == b_6(D))
   goto ; [INV]
 else
   goto ; [INV]


So It knows that a_5 and b_6 are equivalence, and it knows that i_1 <
a_5 in BB3 as well..

so we should be able to indicate that  i_1 == b_6 as [0,0]..  we
currently aren't.   I think I had turned on equivalence mapping during
relational processing, so should be able to tag that without
transitive relations...  I'll have a look at why.

And once we get a bit further along, you will be able to access this
without ranger.. if one wants to simply register the relations 
directly.


Anyway, I'll get back to you why its currently being missed.

Andrew




As promised.  There was a typo in the equivalency comparisons... so it
was getting missed.  With the fix, the oracle identifies the relation
and evrp will now fold that expression away and the IL becomes:

     :
    if (a_5(D) == b_6(D))
  goto ; [INV]
    else
  goto ; [INV]

     :
    i_10 = i_1 + 1;

     :
    # i_1 = PHI <0(2), i_10(3)>
    if (i_1 < a_5(D))
  goto ; [INV]
    else
  goto ; [INV]

     :
    return;

for the other cases you quote, there are no predictions such that if a
!= 0 then this equivalency exists...

+  if (a != 0)
+    {
+  c = b;
+    }

but the oracle would register that in the TRUE block,  c and b are
equivalent... so some other pass that was interested in tracking
conditions that make a block relevant would be able to compare 
relations...


I guess to fully leverage optimizations for cases like

   if (a != 0)
 c = b;
   ...
   if (a != 0)
 {
 if (c == b)
...
 }

That is, we'd do simplifications exposed by jump threading but
without actually doing the jump threading (which will of course
not allow all possible simplifications w/o inserting extra PHIs
for computations we might want to re-use).


FWIW, as I mention in the PR, if the upcoming threader work could be 
taught to use the relation oracle, it could easily solve the 
conditional flowing through the a!=0 path.  However, we wouldn't be 
able to thread it because in this particular case, the path crosses 
loop boundaries.


I leave it to Jeff/others to pontificate on whether the jump-threader 
path duplicator could be taught to through loops. ??


Aldy

This is still bouncing around in my head. I think we have the tools to 
do this better than via threading,  Ranger is now trivially capable of 
calculating when a predicate expression is true or false at another 
location in the IL. Combine this with flagging relations that are true 
when the predicate is, and that relation could be simply added into the 
oracle.


ie:

     :
    if (a_5(D) != 0)
  goto ; [INV]
    else
  goto ; [INV]

     :
     :
    # c_1 = PHI 

the predicate and relations are:
    (a_5 != 0)  ->  c_1 == b_7
    (a_5 == 0) -> c_1 == c_6

then :

 :
    # i_2 = PHI <0(4), i_12(8)>
    if (c_1 > i_2)
  goto ; [INV]
    else
  goto ; [INV]

     :    9->5 registers c_1 > 1_2 
with the oracle

    if (a_5(D) != 0)
  goto ; [INV]
    else
  goto ; [INV]

     :
    if (i_2 == b_7(D))
  goto ; [INV]
    else
  goto ; [INV]
..
If we know to check the predicate list in bb_6, ranger can answer the 
question: on the branch in bb6, a_5 != 0.
This in turn means the predicated relation c_1 == b_7 can be applied to 
bb6 and register with the oracle.
Once that is done,  we already know c_1 > i_2 so we'll fold i_2 == b_7 
as [0, 0] as the equivalency between b_7 and c_1 is now applied.


So the capability is built in.. it boils down to finding the predicated 
relations we care about and knowing to apply them.
This one is pretty straightforward because the condition is exactly the 
same.   When we see a_5 != 0, a_5 is in the export list and we just 
check to see if there are any predicated flagged on a_5.  The actual 
expression could be more complicated, and it would still be able to 
answer it.  This is very similar to how the new threader finds 
threads..  It matches imports and exports.. here we mostly care just 
about the exports and figuring out what the predicates we care about are,


Anyway, theres a fit in there some

[committed] libstdc++: Implement LWG 415 for std::ws

2021-06-28 Thread Jonathan Wakely via Gcc-patches
For C++11 std::ws changed to be an unformatted input function, meaning
it constructs a sentry and sets badbit on exceptions.

libstdc++-v3/ChangeLog:

* doc/xml/manual/intro.xml: Document LWG 415 change.
* doc/html/manual/bugs.html: Regenerate.
* include/bits/istream.tcc (ws): Create sentry and catch
exceptions.
* testsuite/27_io/basic_istream/ws/char/lwg415.cc: New test.
* testsuite/27_io/basic_istream/ws/wchar_t/lwg415.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

commit e5c422b7d8af6f42f8ab230133210742b7ac5661
Author: Jonathan Wakely 
Date:   Fri Jun 25 21:33:02 2021

libstdc++: Implement LWG 415 for std::ws

For C++11 std::ws changed to be an unformatted input function, meaning
it constructs a sentry and sets badbit on exceptions.

libstdc++-v3/ChangeLog:

* doc/xml/manual/intro.xml: Document LWG 415 change.
* doc/html/manual/bugs.html: Regenerate.
* include/bits/istream.tcc (ws): Create sentry and catch
exceptions.
* testsuite/27_io/basic_istream/ws/char/lwg415.cc: New test.
* testsuite/27_io/basic_istream/ws/wchar_t/lwg415.cc: New test.

diff --git a/libstdc++-v3/doc/xml/manual/intro.xml 
b/libstdc++-v3/doc/xml/manual/intro.xml
index 45762caa711..86ed6964b6a 100644
--- a/libstdc++-v3/doc/xml/manual/intro.xml
+++ b/libstdc++-v3/doc/xml/manual/intro.xml
@@ -634,6 +634,13 @@ requirements of the license of GCC.
 Have open clear the error flags.
 
 
+http://www.w3.org/1999/xlink"; xlink:href="&DR;#415">415:
+   Behavior of std::ws
+
+Change it to be an unformatted input function
+  (i.e. construct a sentry and catch exceptions).
+
+
 http://www.w3.org/1999/xlink"; 
xlink:href="../ext/lwg-closed.html#431">431:
Swapping containers with unequal allocators
 
diff --git a/libstdc++-v3/include/bits/istream.tcc 
b/libstdc++-v3/include/bits/istream.tcc
index 1b046bec937..2a153c2e140 100644
--- a/libstdc++-v3/include/bits/istream.tcc
+++ b/libstdc++-v3/include/bits/istream.tcc
@@ -1057,17 +1057,43 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   typedef typename __istream_type::int_type__int_type;
   typedef ctype<_CharT>__ctype_type;
 
-  const __ctype_type& __ct = use_facet<__ctype_type>(__in.getloc());
-  const __int_type __eof = _Traits::eof();
-  __streambuf_type* __sb = __in.rdbuf();
-  __int_type __c = __sb->sgetc();
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 451. behavior of std::ws
+  typename __istream_type::sentry __cerb(__in, true);
+  if (__cerb)
+   {
+ ios_base::iostate __err = ios_base::goodbit;
+ __try
+   {
+ const __ctype_type& __ct = use_facet<__ctype_type>(__in.getloc());
+ const __int_type __eof = _Traits::eof();
+ __streambuf_type* __sb = __in.rdbuf();
+ __int_type __c = __sb->sgetc();
 
-  while (!_Traits::eq_int_type(__c, __eof)
-&& __ct.is(ctype_base::space, _Traits::to_char_type(__c)))
-   __c = __sb->snextc();
-
-   if (_Traits::eq_int_type(__c, __eof))
-__in.setstate(ios_base::eofbit);
+ while (true)
+   {
+ if (_Traits::eq_int_type(__c, __eof))
+   {
+ __err = ios_base::eofbit;
+ break;
+   }
+ if (!__ct.is(ctype_base::space, _Traits::to_char_type(__c)))
+   break;
+ __c = __sb->snextc();
+   }
+   }
+ __catch (const __cxxabiv1::__forced_unwind&)
+   {
+ __in._M_setstate(ios_base::badbit);
+ __throw_exception_again;
+   }
+ __catch (...)
+   {
+ __in._M_setstate(ios_base::badbit);
+   }
+ if (__err)
+   __in.setstate(__err);
+   }
   return __in;
 }
 
diff --git a/libstdc++-v3/testsuite/27_io/basic_istream/ws/char/lwg415.cc 
b/libstdc++-v3/testsuite/27_io/basic_istream/ws/char/lwg415.cc
new file mode 100644
index 000..fe6980dff29
--- /dev/null
+++ b/libstdc++-v3/testsuite/27_io/basic_istream/ws/char/lwg415.cc
@@ -0,0 +1,77 @@
+#include 
+
+// C++11 27.7.2.4 Standard basic_istream manipulators [istream.manip]
+//
+// LWG 415. behavior of std::ws
+// std::ws is an unformatted input function.
+
+#include 
+#include 
+#include 
+
+void
+test01()
+{
+  std::istream is(0);
+  VERIFY( is.rdstate() == std::ios_base::badbit );
+
+  is >> std::ws; // sentry should set failbit
+  VERIFY( is.rdstate() & std::ios_base::failbit );
+}
+
+void
+test02()
+{
+  __gnu_test::sync_streambuf buf;
+  std::istream is(&buf);
+
+  __gnu_test::sync_streambuf buf_tie;
+  std::ostream os_tie(&buf_tie);
+
+  // A sentry should be constructed so is.tie()->flush() should be called.
+  // The standard allows the flush to be deferred becau

[committed] libstdc++: Allow unique_ptr::operator[] [PR 101236]

2021-06-28 Thread Jonathan Wakely via Gcc-patches
PR libstdc++/101236 shows that LLVM depends on being able to use
unique_ptr::operator[] when T is incomplete. This is undefined, but
previously worked with libstdc++. When I added the conditional noexcept
to that operator we started to diagnose the incomplete type.

This change restores support for that case, by making the noexcept
condition check that the type is complete before checking whether
indexing on the pointer can throw.  A workaround for PR c++/101239 is
needed to avoid a bogus error where G++ fails to do SFINAE on the
ill-formed p[n] expression and gets an ICE. Instead of checking that the
p[n] expression is valid in the trailing-return-type, we only check that
the element_type is complete.

libstdc++-v3/ChangeLog:

PR libstdc++/101236
* include/bits/unique_ptr.h (unique_ptr::operator[]):
Fail gracefully if element_type is incomplete.
* testsuite/20_util/unique_ptr/cons/incomplete.cc: Clarify that
the standard doesn't require this test to work for array types.
* testsuite/20_util/unique_ptr/lwg2762.cc: Check that incomplete
types can be used with array specialization.
* testsuite/20_util/unique_ptr/101236.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

commit b7a89c041aa1d67654f1ba7b2839e221c3e14748
Author: Jonathan Wakely 
Date:   Mon Jun 28 12:59:19 2021

libstdc++: Allow unique_ptr::operator[] [PR 101236]

PR libstdc++/101236 shows that LLVM depends on being able to use
unique_ptr::operator[] when T is incomplete. This is undefined, but
previously worked with libstdc++. When I added the conditional noexcept
to that operator we started to diagnose the incomplete type.

This change restores support for that case, by making the noexcept
condition check that the type is complete before checking whether
indexing on the pointer can throw.  A workaround for PR c++/101239 is
needed to avoid a bogus error where G++ fails to do SFINAE on the
ill-formed p[n] expression and gets an ICE. Instead of checking that the
p[n] expression is valid in the trailing-return-type, we only check that
the element_type is complete.

libstdc++-v3/ChangeLog:

PR libstdc++/101236
* include/bits/unique_ptr.h (unique_ptr::operator[]):
Fail gracefully if element_type is incomplete.
* testsuite/20_util/unique_ptr/cons/incomplete.cc: Clarify that
the standard doesn't require this test to work for array types.
* testsuite/20_util/unique_ptr/lwg2762.cc: Check that incomplete
types can be used with array specialization.
* testsuite/20_util/unique_ptr/101236.cc: New test.

diff --git a/libstdc++-v3/include/bits/unique_ptr.h 
b/libstdc++-v3/include/bits/unique_ptr.h
index 1781fe15649..e478056c755 100644
--- a/libstdc++-v3/include/bits/unique_ptr.h
+++ b/libstdc++-v3/include/bits/unique_ptr.h
@@ -491,6 +491,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  = __and_< is_base_of<_Tp, _Up>,
__not_, __remove_cv<_Up>>> >;
 
+  // This checks whether p[n] is noexcept, but fails gracefully when
+  // element_type is incomplete. The standard requires a complete type
+  // for unique_ptr, but we try to support it anyway (PR 101236).
+  template
+   static constexpr auto
+   _S_nothrow_deref(size_t __n)
+   -> decltype(sizeof(_Elt) != 0) // PR c++/101239
+   { return noexcept(std::declval<_Ptr>()[__n]); }
+
+  template
+   static constexpr bool
+   _S_nothrow_deref(...)
+   { return false; }
+
 public:
   using pointer  = typename __uniq_ptr_impl<_Tp, _Dp>::pointer;
   using element_type  = _Tp;
@@ -655,7 +669,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /// Access an element of owned array.
   typename std::add_lvalue_reference::type
   operator[](size_t __i) const
-  noexcept(noexcept(std::declval()[std::declval()]))
+  noexcept(_S_nothrow_deref(0))
   {
__glibcxx_assert(get() != pointer());
return get()[__i];
diff --git a/libstdc++-v3/testsuite/20_util/unique_ptr/101236.cc 
b/libstdc++-v3/testsuite/20_util/unique_ptr/101236.cc
new file mode 100644
index 000..2f55f4baf9a
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/unique_ptr/101236.cc
@@ -0,0 +1,13 @@
+// { dg-do compile { target c++11 } }
+#include 
+
+struct Incomplete;
+struct pr101236
+{
+  // The standard says "T shall be a complete type" for unique_ptr
+  // so this is a GCC extension.
+  std::unique_ptr p;
+
+  Incomplete& f() { return p[0]; }
+};
+struct Incomplete { };
diff --git a/libstdc++-v3/testsuite/20_util/unique_ptr/cons/incomplete.cc 
b/libstdc++-v3/testsuite/20_util/unique_ptr/cons/incomplete.cc
index 879a1d021a1..6b55d5744ed 100644
--- a/libstdc++-v3/testsuite/20_util/unique_ptr/cons/incomplete.cc
+++ b/libstdc++-v3/testsuite/20_util/unique_ptr/cons/incomplete.cc
@@ -24,9 +24,17 @@ struct Incomplete;
 v

Re: [PATCH] Generalize -fuse-ld= to support absolute path or arbitrary ld.linker

2021-06-28 Thread Martin Liška

On 6/28/21 2:24 PM, Artur Sinila wrote:

On Mon, 2021-06-28 at 14:08 +0200, Martin Liška wrote:

On 6/26/21 4:44 PM, Artur Sinila wrote:

Not so gentle ping :)
What should happen in order for this patch to be accepted?



Hello.

We came up to conclusion that one can use the currently supported
option
-fuse-ld={bfd,gold,lld} with -B that can point to an arbitrary path
the such linker is expected.

Cheers,
Martin


Hello.

Thank you for the reply. I'd like to use mold linker with gcc:
https://github.com/rui314/mold. So your solution doesn't help.


Well, kind of works. You only need to create a symlink called
ld which will point to your linker (plus using -B argument as mentioned).



There are 2 options:
1. Add mold to -fuse-ld option
2. Implement --ld-path

The 2nd option is much more future-proof: you won't need to add new
-fuse-ld variant each time new linker comes up. To provide some
context: clang had been supporting passing path to -fuse-ld, but since
clang 12 this is deprecated in favor of new --ld-path option. I think
we should take an example from clang and implement this useful feature
in gcc as well.


Can you please provide a pointer for the deprecation.
I'm adding Jakub who recommended using the -B argument.

Martin



Cheers,
Artur Sinila





Re: [PATCH] Generalize -fuse-ld= to support absolute path or arbitrary ld.linker

2021-06-28 Thread Artur Sinila via Gcc-patches
On Mon, 2021-06-28 at 15:26 +0200, Martin Liška wrote:
> On 6/28/21 2:24 PM, Artur Sinila wrote:
> > On Mon, 2021-06-28 at 14:08 +0200, Martin Liška wrote:
> > > On 6/26/21 4:44 PM, Artur Sinila wrote:
> > > > Not so gentle ping :)
> > > > What should happen in order for this patch to be accepted?
> > > > 
> > > 
> > > Hello.
> > > 
> > > We came up to conclusion that one can use the currently supported
> > > option
> > > -fuse-ld={bfd,gold,lld} with -B that can point to an arbitrary
> > > path
> > > the such linker is expected.
> > > 
> > > Cheers,
> > > Martin
> > 
> > Hello.
> > 
> > Thank you for the reply. I'd like to use mold linker with gcc:
> > https://github.com/rui314/mold. So your solution doesn't help.
> 
> Well, kind of works. You only need to create a symlink called
> ld which will point to your linker (plus using -B argument as
> mentioned).
> 
> > 
> > There are 2 options:
> > 1. Add mold to -fuse-ld option
> > 2. Implement --ld-path
> > 
> > The 2nd option is much more future-proof: you won't need to add new
> > -fuse-ld variant each time new linker comes up. To provide some
> > context: clang had been supporting passing path to -fuse-ld, but
> > since
> > clang 12 this is deprecated in favor of new --ld-path option. I
> > think
> > we should take an example from clang and implement this useful
> > feature
> > in gcc as well.
> 
> Can you please provide a pointer for the deprecation.
> I'm adding Jakub who recommended using the -B argument.
> 
> Martin
> 
> > 
> > Cheers,
> > Artur Sinila
> > 
> 

See https://reviews.llvm.org/D83015. Speaking about -B option, AFAIK it
will make gcc to search for collect2, cc and other tools in specified
directory, so apart from creating a symlink to mold, I'll also need to
create symlinks to all those tools. Didn't try it in practice though.

Best regards,
Artur Sinila



Re: [PATCH] Generalize -fuse-ld= to support absolute path or arbitrary ld.linker

2021-06-28 Thread Jakub Jelinek via Gcc-patches
On Mon, Jun 28, 2021 at 03:26:21PM +0200, Martin Liška wrote:
> > There are 2 options:
> > 1. Add mold to -fuse-ld option
> > 2. Implement --ld-path
> > 
> > The 2nd option is much more future-proof: you won't need to add new
> > -fuse-ld variant each time new linker comes up. To provide some
> > context: clang had been supporting passing path to -fuse-ld, but since
> > clang 12 this is deprecated in favor of new --ld-path option. I think
> > we should take an example from clang and implement this useful feature
> > in gcc as well.
> 
> Can you please provide a pointer for the deprecation.
> I'm adding Jakub who recommended using the -B argument.

-B will work with any gcc version, at least from the past 3+ decades,
just mkdir /whatever/dir/ and put the linker or symlink (with ld basename)
to it there, then -B /whatever/dir/

--ld-path= is a bad idea, it doesn't follow use any usual option naming
conventions.

Note, all these extra linkers (lld, mold) will not really work properly,
gcc during configuration detects various assembler and linker properties
on which it then relies on and I'm sure neither lld nor mold supports
those features.

HAVE_LD_ALIGNED_SHF_MERGE
HAVE_LD_AS_NEEDED
HAVE_LD_AVR_AVRXMEGA3_RODATA_IN_FLASH
HAVE_LD_BNDPLT_SUPPORT
HAVE_LD_BROKEN_PE_DWARF5
HAVE_LD_BUILDID
HAVE_LD_CLEARCAP
HAVE_LD_COMPRESS_DEBUG
HAVE_LD_DEMANGLE
HAVE_LD_EH_FRAME_CIEV3
HAVE_LD_EH_FRAME_HDR
HAVE_LD_EH_GC_SECTIONS
HAVE_LD_EH_GC_SECTIONS_BUG
HAVE_LD_LARGE_TOC
HAVE_LD_NO_DOT_SYMS
HAVE_LD_PERSONALITY_RELAXATION
HAVE_LD_PIE
HAVE_LD_PIE_COPYRELOC
HAVE_LD_PPC_GNU_ATTR_LONG_DOUBLE
HAVE_LD_PUSHPOPSTATE_SUPPORT
HAVE_LD_RO_RW_SECTION_MIXING
HAVE_LD_SOL2_EMULATION
HAVE_LD_STATIC_DYNAMIC
HAVE_LD_SYSROOT

is what is currently tested (not all of these for all targets).

Jakub



Re: [PATCH] Generalize -fuse-ld= to support absolute path or arbitrary ld.linker

2021-06-28 Thread Jakub Jelinek via Gcc-patches
On Mon, Jun 28, 2021 at 04:41:06PM +0300, Artur Sinila wrote:
> See https://reviews.llvm.org/D83015. Speaking about -B option, AFAIK it
> will make gcc to search for collect2, cc and other tools in specified
> directory, so apart from creating a symlink to mold, I'll also need to
> create symlinks to all those tools. Didn't try it in practice though.

You don't.  -B adds a path to the list of paths searched for the various
tools etc.  If the tool is not found in any of those directories, the
search continues in the standard paths etc. as described in the
documentation.

Jakub



PING: [PATCH] mips: Fix up mips_atomic_assign_expand_fenv [PR94780]

2021-06-28 Thread Xi Ruoyao via Gcc-patches
Ping.  CC several maintainers who may help to review MIPS patches. 
Sorry if it sounds buzzing.

On Wed, 2021-06-23 at 11:11 +0800, Xi Ruoyao wrote:
> Commit message shamelessly copied from 1777beb6b129 by jakub:
> 
> This function, because it is sometimes called even outside of function
> bodies, uses create_tmp_var_raw rather than create_tmp_var.  But in
> order
> for that to work, when first referenced, the VAR_DECLs need to appear
> in a
> TARGET_EXPR so that during gimplification the var gets the right
> DECL_CONTEXT and is added to local decls.
> 
> Bootstrapped & regtested on mips64el-linux-gnu.  Ok for trunk and
> backport
> to 11, 10, and 9?
> 
> gcc/
> 
> * config/mips/mips.c (mips_atomic_assign_expand_fenv): Use
>   TARGET_EXPR instead of MODIFY_EXPR.
> ---
>  gcc/config/mips/mips.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
> index 8f043399a8e..89d1be6cea6 100644
> --- a/gcc/config/mips/mips.c
> +++ b/gcc/config/mips/mips.c
> @@ -22439,12 +22439,12 @@ mips_atomic_assign_expand_fenv (tree *hold,
> tree *clear, tree *update)
>    tree get_fcsr = mips_builtin_decls[MIPS_GET_FCSR];
>    tree set_fcsr = mips_builtin_decls[MIPS_SET_FCSR];
>    tree get_fcsr_hold_call = build_call_expr (get_fcsr, 0);
> -  tree hold_assign_orig = build2 (MODIFY_EXPR, MIPS_ATYPE_USI,
> - fcsr_orig_var, get_fcsr_hold_call);
> +  tree hold_assign_orig = build4 (TARGET_EXPR, MIPS_ATYPE_USI,
> + fcsr_orig_var, get_fcsr_hold_call,
> NULL, NULL);
>    tree hold_mod_val = build2 (BIT_AND_EXPR, MIPS_ATYPE_USI,
> fcsr_orig_var,
>   build_int_cst (MIPS_ATYPE_USI,
> 0xf003));
> -  tree hold_assign_mod = build2 (MODIFY_EXPR, MIPS_ATYPE_USI,
> -    fcsr_mod_var, hold_mod_val);
> +  tree hold_assign_mod = build4 (TARGET_EXPR, MIPS_ATYPE_USI,
> +    fcsr_mod_var, hold_mod_val, NULL,
> NULL);
>    tree set_fcsr_hold_call = build_call_expr (set_fcsr, 1,
> fcsr_mod_var);
>    tree hold_all = build2 (COMPOUND_EXPR, MIPS_ATYPE_USI,
>   hold_assign_orig, hold_assign_mod);
> @@ -22454,8 +22454,8 @@ mips_atomic_assign_expand_fenv (tree *hold,
> tree *clear, tree *update)
>    *clear = build_call_expr (set_fcsr, 1, fcsr_mod_var);
>  
>    tree get_fcsr_update_call = build_call_expr (get_fcsr, 0);
> -  *update = build2 (MODIFY_EXPR, MIPS_ATYPE_USI,
> -   exceptions_var, get_fcsr_update_call);
> +  *update = build4 (TARGET_EXPR, MIPS_ATYPE_USI,
> +   exceptions_var, get_fcsr_update_call, NULL, NULL);
>    tree set_fcsr_update_call = build_call_expr (set_fcsr, 1,
> fcsr_orig_var);
>    *update = build2 (COMPOUND_EXPR, void_type_node, *update,
>     set_fcsr_update_call);

-- 
Xi Ruoyao 



Re: [committed] libstdc++: Implement LWG 2762 for std::unique_ptr::operator*

2021-06-28 Thread Jonathan Wakely via Gcc-patches
On Thu, 24 Jun 2021 at 22:11, Tim Song wrote:
>
> That example violates http://eel.is/c++draft/unique.ptr.runtime.general#3

Even though it's undefined I committed a workaround to allow it,
because it breaks LLVM:
https://gcc.gnu.org/pipermail/libstdc++/2021-June/052851.html
(I forgot to send that mail as a reply to this thread, sorry).

> On Thu, Jun 24, 2021 at 1:55 PM Patrick Palka via Gcc-patches
>  wrote:
> >
> > On Thu, 24 Jun 2021, Jonathan Wakely via Libstdc++ wrote:
> >
> > > The LWG issue proposes to add a conditional noexcept-specifier to
> > > std::unique_ptr's dereference operator. The issue is currently in
> > > Tentatively Ready status, but even if it isn't voted into the draft, we
> > > can do it as a conforming extensions. This commit also adds a similar
> > > noexcept-specifier to operator[] for the unique_ptr partial
> > > specialization.
> >
> > The conditional noexcept added to unique_ptr::operator[] seems to break
> > the case where T is complete only after the fact:
> >
> >   struct T;
> >   extern std::unique_ptr p;
> >   auto& t = p[1];
> >   struct T { };
> >
> > /include/c++/12.0.0/bits/unique_ptr.h: In instantiation of ‘typename 
> > std::add_lvalue_reference<_Tp>::type std::unique_ptr<_Tp [], 
> > _Dp>::operator[](std::size_t) co
> > nst [with _Tp = A; _Dp = std::default_delete; typename 
> > std::add_lvalue_reference<_Tp>::type = A&; std::size_t = long unsigned 
> > int]’:
> > testcase.cc:5:14:   required from here
> > /include/c++/12.0.0/bits/unique_ptr.h:658:48: error: invalid use of 
> > incomplete type ‘struct A’
> >   658 |   
> > noexcept(noexcept(std::declval()[std::declval()]))
> >   | ~~~^
> > testcase.cc:3:8: note: forward declaration of ‘struct A’
> > 3 | struct A;
> >   |^
> >
> > >
> > > Also ensure that all dereference operators for shared_ptr are noexcept,
> > > and adds tests for the std::optional accessors modified by the issue,
> > > which were already noexcept in our implementation.
> > >
> > > Signed-off-by: Jonathan Wakely 
> > >
> > > libstdc++-v3/ChangeLog:
> > >
> > >   * include/bits/shared_ptr_base.h (__shared_ptr_access::operator[]):
> > >   Add noexcept.
> > >   * include/bits/unique_ptr.h (unique_ptr::operator*): Add
> > >   conditional noexcept as per LWG 2762.
> > >   * testsuite/20_util/shared_ptr/observers/array.cc: Check that
> > >   dereferencing cannot throw.
> > >   * testsuite/20_util/shared_ptr/observers/get.cc: Likewise.
> > >   * testsuite/20_util/optional/observers/lwg2762.cc: New test.
> > >   * testsuite/20_util/unique_ptr/lwg2762.cc: New test.
> > >
> > > Tested powerpc64le-linux. Committed to trunk.
> > >
> > >
>



Re: [committed] libstdc++: More workarounds in 17_intro/names.cc test [PR 97088]

2021-06-28 Thread Jonathan Wakely via Gcc-patches
On Mon, 28 Jun 2021 at 12:56, Christophe LYON wrote:
>
>
> On 25/06/2021 21:51, Jonathan Wakely via Libstdc++ wrote:
> > Conditionally #undef some more names that are used in system headers.
> >
> > libstdc++-v3/ChangeLog:
> >
> >   PR libstdc++/97088
> >   * testsuite/17_intro/names.cc: Undef more names for newlib and
> >   also for arm-none-linux-gnueabi.
> >   * testsuite/experimental/names.cc: Disable PCH.
> >
> > Tested powerpc64le-linux. Committed to trunk.
>
> Hi Jonathan,
>
> After disabling PCH, we now have the following failures on arm-eabi,
> using newlib-3.3:
>
> FAIL: experimental/names.cc (test for excess errors)
> Excess errors:
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/math.h:194:
> error: expected ')' before ';' token
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/math.h:195:
> error: expected ')' before ';' token
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/math.h:196:
> error: expected ')' before ';' token
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/math.h:197:
> error: expected ')' before ';' token
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/math.h:198:
> error: expected ')' before ';' token
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/math.h:199:
> error: expected ')' before ';' token
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/math.h:200:
> error: expected ')' before ';' token
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/math.h:201:
> error: expected ')' before ';' token
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/time.h:110:
> error: expected unqualified-id before ';' token
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/time.h:110:
> error: expected ')' before ';' token
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/inttypes.h:323:
> error: expected ')' before ';' token
>
> All this is a bit of a mess :-)

Ugh, that's because I made a mess of the #if logic. My last change was
supposed to avoid exactly those errors, but I messed up.

I'm testing the attached patch (but not on arm or newlib), which should fix it.
commit 75f948f089cebfd00913635264e20610d0f2
Author: Jonathan Wakely 
Date:   Mon Jun 28 15:13:34 2021

libstdc++: Fix backwards logic in 17_intro/names.cc test [PR 97088]

I meant to undef the names that clash with newlib headers for newlib,
but I only undef'd them for non-newlib targets. This means they still
cause errors for newlib, and aren't tested for other targets.

This fixes the test to check those names for non-newlib targets, and to
undef them to avoid errors for newlib.

libstdc++-v3/ChangeLog:

PR libstdc++/97088
* testsuite/17_intro/names.cc: Fix #if condition for names used
by newlib headers.

diff --git a/libstdc++-v3/testsuite/17_intro/names.cc 
b/libstdc++-v3/testsuite/17_intro/names.cc
index 805c1002c3f..aca7a8e5812 100644
--- a/libstdc++-v3/testsuite/17_intro/names.cc
+++ b/libstdc++-v3/testsuite/17_intro/names.cc
@@ -123,6 +123,10 @@
 #define ptr (
 #endif
 
+// This clashes with newlib so don't use it.
+# define __lockablecannot be used as an identifier
+
+
 // Common template parameter names
 #define OutputIterator OutputIterator is not a reserved name
 #define InputIterator  InputIterator is not a reserved name
@@ -222,9 +226,9 @@
 #undef y
 #endif
 
-#if ! __has_include()
-// newlib's  defines __lockable as a macro, so we can't use it.
-# define __lockablecannot be used as an identifier
+#if __has_include()
+// newlib's  defines __lockable as a macro.
+#undef __lockable
 // newlib's  defines __tzrule_type with these members.
 #undef d
 #undef m


[committed] libstdc++: Remove redundant explicit instantiations

2021-06-28 Thread Jonathan Wakely via Gcc-patches
These function templates are explicitly specialized for char and wchar_t
streambufs, so the explicit instantiations do nothing. Remove them, to
avoid confusion.

libstdc++-v3/ChangeLog:

* include/bits/streambuf.tcc (__copy_streambufs_eof): Remove
explicit instantiation declarations.
* src/c++11/streambuf-inst.cc (__copy_streambufs_eof): Remove
explicit instantiation definitions.

Tested powerpc64le-linux. Committed to trunk.

commit 084635aa80daa45403aebd86712b2c61779c4173
Author: Jonathan Wakely 
Date:   Mon Jun 28 15:16:08 2021

libstdc++: Remove redundant explicit instantiations

These function templates are explicitly specialized for char and wchar_t
streambufs, so the explicit instantiations do nothing. Remove them, to
avoid confusion.

libstdc++-v3/ChangeLog:

* include/bits/streambuf.tcc (__copy_streambufs_eof): Remove
explicit instantiation declarations.
* src/c++11/streambuf-inst.cc (__copy_streambufs_eof): Remove
explicit instantiation definitions.

diff --git a/libstdc++-v3/include/bits/streambuf.tcc 
b/libstdc++-v3/include/bits/streambuf.tcc
index cbcfb0c790e..22464c4401c 100644
--- a/libstdc++-v3/include/bits/streambuf.tcc
+++ b/libstdc++-v3/include/bits/streambuf.tcc
@@ -147,25 +147,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // which are defined via explicit instantiations elsewhere.
 #if _GLIBCXX_EXTERN_TEMPLATE
   extern template class basic_streambuf;
+
   extern template
 streamsize
 __copy_streambufs(basic_streambuf*,
  basic_streambuf*);
-  extern template
-streamsize
-__copy_streambufs_eof(basic_streambuf*,
- basic_streambuf*, bool&);
 
 #ifdef _GLIBCXX_USE_WCHAR_T
   extern template class basic_streambuf;
+
   extern template
 streamsize
 __copy_streambufs(basic_streambuf*,
  basic_streambuf*);
-  extern template
-streamsize
-__copy_streambufs_eof(basic_streambuf*,
- basic_streambuf*, bool&);
 #endif
 #endif
 
diff --git a/libstdc++-v3/src/c++11/streambuf-inst.cc 
b/libstdc++-v3/src/c++11/streambuf-inst.cc
index 497f54e193f..c2c2ee9a688 100644
--- a/libstdc++-v3/src/c++11/streambuf-inst.cc
+++ b/libstdc++-v3/src/c++11/streambuf-inst.cc
@@ -40,11 +40,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 streamsize
 __copy_streambufs(basic_streambuf*, basic_streambuf*);
 
-  template
-streamsize
-__copy_streambufs_eof(basic_streambuf*,
- basic_streambuf*, bool&);
-
 #ifdef _GLIBCXX_USE_WCHAR_T
   // wstreambuf
   template class basic_streambuf;
@@ -52,11 +47,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 streamsize
 __copy_streambufs(basic_streambuf*, basic_streambuf*);
-
-  template
-streamsize
-__copy_streambufs_eof(basic_streambuf*,
- basic_streambuf*, bool&);
 #endif
 
 _GLIBCXX_END_NAMESPACE_VERSION


Re: [committed] libstdc++: More workarounds in 17_intro/names.cc test [PR 97088]

2021-06-28 Thread Jonathan Wakely via Gcc-patches
On Mon, 28 Jun 2021 at 15:20, Jonathan Wakely wrote:
>
> On Mon, 28 Jun 2021 at 12:56, Christophe LYON wrote:
> >
> >
> > On 25/06/2021 21:51, Jonathan Wakely via Libstdc++ wrote:
> > > Conditionally #undef some more names that are used in system headers.
> > >
> > > libstdc++-v3/ChangeLog:
> > >
> > >   PR libstdc++/97088
> > >   * testsuite/17_intro/names.cc: Undef more names for newlib and
> > >   also for arm-none-linux-gnueabi.
> > >   * testsuite/experimental/names.cc: Disable PCH.
> > >
> > > Tested powerpc64le-linux. Committed to trunk.
> >
> > Hi Jonathan,
> >
> > After disabling PCH, we now have the following failures on arm-eabi,
> > using newlib-3.3:
> >
> > FAIL: experimental/names.cc (test for excess errors)
> > Excess errors:
> > /aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/math.h:194:
> > error: expected ')' before ';' token
> > /aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/math.h:195:
> > error: expected ')' before ';' token
> > /aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/math.h:196:
> > error: expected ')' before ';' token
> > /aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/math.h:197:
> > error: expected ')' before ';' token
> > /aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/math.h:198:
> > error: expected ')' before ';' token
> > /aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/math.h:199:
> > error: expected ')' before ';' token
> > /aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/math.h:200:
> > error: expected ')' before ';' token
> > /aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/math.h:201:
> > error: expected ')' before ';' token
> > /aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/time.h:110:
> > error: expected unqualified-id before ';' token
> > /aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/time.h:110:
> > error: expected ')' before ';' token
> > /aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/inttypes.h:323:
> > error: expected ')' before ';' token
> >
> > All this is a bit of a mess :-)
>
> Ugh, that's because I made a mess of the #if logic. My last change was
> supposed to avoid exactly those errors, but I messed up.
>
> I'm testing the attached patch (but not on arm or newlib), which should fix 
> it.

That's pushed to trunk now (r12-1845).



pdp11: Fix warnings to allow compilation with a recent GCC and --enable-werror-always

2021-06-28 Thread Jan-Benedict Glaw
Hi Paul!

I'd like to install this patch to let the pdp11-aout configuration
build again with eg.

../gcc/configure --target=pdp11-aout --enable-werror-always \
--enable-languages=all --disable-gcov --disable-shared \
--disable-threads --without-headers \
--prefix=/var/lib/laminar/run/gcc-pdp11-aout/5/toolchain-install

No testsuite (yet? Maybe I'd add a bit), but re-checked some Hello
World'ish code for no changes and it still runs on a SIMH pdp11.

  * config/pdp11/pdp11.h (ASM_OUTPUT_SKIP): Fix signedness warning.
  * config/pdp11/pdp11.c (pdp11_asm_print_operand_punct_valid_p):
  Remove "register" keyword.
  (pdp11_initial_elimination_offset) Remove unused variable.
  (pdp11_cmp_length) Ditto.
  (pdp11_insn_cost): Ditto, and fix signedness warning.

diff --git a/gcc/config/pdp11/pdp11.h b/gcc/config/pdp11/pdp11.h
index a21ae648439..9bc5e089f49 100644
--- a/gcc/config/pdp11/pdp11.h
+++ b/gcc/config/pdp11/pdp11.h
@@ -618,10 +618,12 @@ extern int current_first_parm_offset;
 fprintf (FILE, "\t.even\n")
 
 #define ASM_OUTPUT_SKIP(FILE,SIZE)  \
-  if (TARGET_DEC_ASM) \
-fprintf (FILE, "\t.blkb\t%o\n", (SIZE) & 0x);  \
-  else \
-fprintf (FILE, "\t.=.+ %#o\n", (SIZE) & 0x);
+  do { \
+if (TARGET_DEC_ASM)\
+  fprintf (FILE, "\t.blkb\t%o\n", (int) ((SIZE) & 0x));\
+else   \
+  fprintf (FILE, "\t.=.+ %#o\n", (int) ((SIZE) & 0x)); \
+  } while (0)
 
 /* This says how to output an assembler line
to define a global common symbol.  */
diff --git a/gcc/config/pdp11/pdp11.c b/gcc/config/pdp11/pdp11.c
index b663b43a29c..4cab3aee598 100644
--- a/gcc/config/pdp11/pdp11.c
+++ b/gcc/config/pdp11/pdp11.c
@@ -829,12 +829,12 @@ pdp11_asm_print_operand_punct_valid_p (unsigned char c)
 }
 
 void
-print_operand_address (FILE *file, register rtx addr)
+print_operand_address (FILE *file, rtx addr)
 {
-  register rtx breg;
+  rtx breg;
   rtx offset;
   int again = 0;
-  
+
  retry:
 
   switch (GET_CODE (addr))
@@ -1160,12 +1160,11 @@ pdp11_addr_cost (rtx addr, machine_mode mode, 
addr_space_t as ATTRIBUTE_UNUSED,
 static int
 pdp11_insn_cost (rtx_insn *insn, bool speed)
 {
-  int base_cost, i;
+  int base_cost;
   rtx pat, set, dest, src, src2;
   machine_mode mode;
-  const char *fmt;
   enum rtx_code op;
-  
+
   if (recog_memoized (insn) < 0)
 return 0;
 
@@ -1462,24 +1461,24 @@ bool
 pushpop_regeq (rtx op, int regno)
 {
   rtx addr;
-  
+
   /* False if not memory reference.  */
   if (GET_CODE (op) != MEM)
 return FALSE;
-  
+
   /* Get the address of the memory reference.  */
   addr = XEXP (op, 0);
 
   if (GET_CODE (addr) == MEM)
 addr = XEXP (addr, 0);
-
+
   switch (GET_CODE (addr))
 {
 case PRE_DEC:
 case POST_INC:
 case PRE_MODIFY:
 case POST_MODIFY:
-  return REGNO (XEXP (addr, 0)) == regno;
+  return REGNO (XEXP (addr, 0)) == (unsigned) regno;
 default:
   return FALSE;
 }
@@ -1771,8 +1770,7 @@ int
 pdp11_initial_elimination_offset (int from, int to)
 {
   /* Get the size of the register save area.  */
-  int spoff;
-  
+
   if (from == FRAME_POINTER_REGNUM && to == STACK_POINTER_REGNUM)
 return get_frame_size ();
   else if (from == ARG_POINTER_REGNUM && to == FRAME_POINTER_REGNUM)
@@ -2106,15 +2104,14 @@ pdp11_cmp_length (rtx *operands, int words)
 {
   rtx inops[2];
   rtx exops[4][2];
-  rtx lb[1];
   int i, len = 0;
 
   if (!reload_completed)
 return 2;
-  
+
   inops[0] = operands[0];
   inops[1] = operands[1];
-  
+
   pdp11_expand_operands (inops, exops, 2, words, NULL, big);
 
   for (i = 0; i < words; i++)


Okay for master?

MfG, JBG

-- 


signature.asc
Description: PGP signature


Re: [PATCH] Port GCC documentation to Sphinx

2021-06-28 Thread Joseph Myers
Are formatted manuals (HTML, PDF, man, info) corresponding to this patch 
version also available for review?

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [committed] libstdc++: More workarounds in 17_intro/names.cc test [PR 97088]

2021-06-28 Thread Christophe LYON via Gcc-patches



On 28/06/2021 17:26, Jonathan Wakely wrote:

On Mon, 28 Jun 2021 at 15:20, Jonathan Wakely wrote:

On Mon, 28 Jun 2021 at 12:56, Christophe LYON wrote:


On 25/06/2021 21:51, Jonathan Wakely via Libstdc++ wrote:

Conditionally #undef some more names that are used in system headers.

libstdc++-v3/ChangeLog:

   PR libstdc++/97088
   * testsuite/17_intro/names.cc: Undef more names for newlib and
   also for arm-none-linux-gnueabi.
   * testsuite/experimental/names.cc: Disable PCH.

Tested powerpc64le-linux. Committed to trunk.

Hi Jonathan,

After disabling PCH, we now have the following failures on arm-eabi,
using newlib-3.3:

FAIL: experimental/names.cc (test for excess errors)
Excess errors:
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/math.h:194:
error: expected ')' before ';' token
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/math.h:195:
error: expected ')' before ';' token
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/math.h:196:
error: expected ')' before ';' token
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/math.h:197:
error: expected ')' before ';' token
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/math.h:198:
error: expected ')' before ';' token
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/math.h:199:
error: expected ')' before ';' token
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/math.h:200:
error: expected ')' before ';' token
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/math.h:201:
error: expected ')' before ';' token
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/time.h:110:
error: expected unqualified-id before ';' token
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/time.h:110:
error: expected ')' before ';' token
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include/inttypes.h:323:
error: expected ')' before ';' token

All this is a bit of a mess :-)

Ugh, that's because I made a mess of the #if logic. My last change was
supposed to avoid exactly those errors, but I messed up.

I'm testing the attached patch (but not on arm or newlib), which should fix it.

That's pushed to trunk now (r12-1845).


Thanks, I'll let you know if there are any problems.


Christophe




Re: [Patch] Add 'default' to -foffload=; document that flag [PR67300]

2021-06-28 Thread Tobias Burnus

I managed to delete the libgomp part before posting the patch, hence,
reposted.

(The change from -foffload= to -foffload-options= ensures that also
other configured compilers such as GCN are used, an issue that Thomas
found. The original -foffload=nvptx-none=-latomic was added because as
otherwise the GCN part caused build issues for Richard.)

Thus, this patch is like v3, except for the invoke.texi fixes suggested
by Sandra (thanks!) + adding a ChangeLog
and like v4, except the lost libgomp changes has been re-added (+
ChangeLog update).

I hope it now is fine.

Tobias
-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
Add 'default' to -foffload=; document that flag [PR67300]

As -foffload={options,targets,targets=options} is very convoluted,
it has been split into -foffload=targets (supporting the old syntax
for backward compatibilty) and -foffload-options={options,target=options}.

Only the new syntax is documented.

Additionally, -foffload=default is supported, which can reset the
devices after -foffload=disable / -foffload=targets to the default,
if needed.

gcc/ChangeLog:

* common.opt (-foffload=): Update description.
	(-foffload-options=): New.
* doc/invoke.texi (C Language Options): Sort options
	alphabetical in optlist and also the description itself.
	(-foffload, -foffload-options): New.
* gcc.c (check_offload_target_name): New, split off from
	handle_foffload_option.
(check_foffload_target_names): New.
(handle_foffload_option): Handle -foffload=default.
(driver_handle_option): Update for -foffload-options.
* lto-opts.c (lto_write_options): Use -foffload-options
	instead of -foffload.
* lto-wrapper.c (merge_and_complain, append_offload_options):
	Likewise.
* opts.c (common_handle_option): Likewise.

libgomp/ChangeLog:

* testsuite/libgomp.c-c++-common/reduction-16.c: Replace
	-foffload=nvptx-none= by -foffload-options=nvptx-none= to
	avoid disabling other offload targets.
* testsuite/libgomp.c-c++-common/reduction-5.c: Likewise.
* testsuite/libgomp.c-c++-common/reduction-6.c: Likewise.
* testsuite/libgomp.c/target-44.c: Likewise.

 gcc/common.opt |  10 +-
 gcc/doc/invoke.texi| 281 -
 gcc/gcc.c  | 100 ++--
 gcc/lto-opts.c |   3 +-
 gcc/lto-wrapper.c  |  10 +-
 gcc/opts.c |   2 +-
 .../testsuite/libgomp.c-c++-common/reduction-16.c  |   2 +-
 .../testsuite/libgomp.c-c++-common/reduction-5.c   |   2 +-
 .../testsuite/libgomp.c-c++-common/reduction-6.c   |   2 +-
 libgomp/testsuite/libgomp.c/target-44.c|   2 +-
 10 files changed, 254 insertions(+), 160 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index a1353e06bdc..a695a8c5964 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2095,9 +2095,15 @@ fnon-call-exceptions
 Common Var(flag_non_call_exceptions) Optimization
 Support synchronous non-call exceptions.
 
+; -foffload= is documented
+; -foffload== is supported for backward compatibility
 foffload=
-Common Driver Joined MissingArgError(options or targets missing after %qs)
--foffload==	Specify offloading targets and options for them.
+Driver Joined MissingArgError(targets missing after %qs)
+-foffload=	Specify offloading targets
+
+foffload-options=
+Common Driver Joined MissingArgError(options or targets=options missing after %qs)
+-foffload==	Specify options for the offloading targets
 
 foffload-abi=
 Common Joined RejectNegative Enum(offload_abi) Var(flag_offload_abi) Init(OFFLOAD_ABI_UNSET)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index af2ce189fae..f8e41d41801 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -197,17 +197,17 @@ in the following sections.
 
 @item C Language Options
 @xref{C Dialect Options,,Options Controlling C Dialect}.
-@gccoptlist{-ansi  -std=@var{standard}  -fgnu89-inline @gol
--fpermitted-flt-eval-methods=@var{standard} @gol
--aux-info @var{filename}  -fallow-parameterless-variadic-functions @gol
--fno-asm  -fno-builtin  -fno-builtin-@var{function}  -fgimple@gol
--fhosted  -ffreestanding @gol
+@gccoptlist{-ansi  -std=@var{standard}  -aux-info @var{filename} @gol
+-fallow-parameterless-variadic-functions  -fno-asm  @gol
+-fno-builtin  -fno-builtin-@var{function}  -fcond-mismatch @gol
+-ffreestanding  -fgimple  -fgnu-tm  -fgnu89-inline  -fhosted @gol
+-flax-vector-conversions  -fms-extensions @gol
 -fopenacc  -fopenacc-dim=@var{geom} @gol
+-foffload=@var{arg} -foffload-options=@var{arg} @gol
 -fopenmp  -fopenmp-simd @gol
--fms-extensions  -fplan9-extensions  -fsso-struct=@var{endianness} @gol
--fallow-single-precision  -fcond-mismatch  -flax-vector-conversions

Re: pdp11: Fix warnings to allow compilation with a recent GCC and --enable-werror-always

2021-06-28 Thread Koning, Paul via Gcc-patches



> On Jun 28, 2021, at 11:33 AM, Jan-Benedict Glaw  wrote:
> 
> Hi Paul!
> 
> I'd like to install this patch to let the pdp11-aout configuration
> build again with eg.
> 
> ../gcc/configure --target=pdp11-aout --enable-werror-always \
>   --enable-languages=all --disable-gcov --disable-shared \
>   --disable-threads --without-headers \
>   --prefix=/var/lib/laminar/run/gcc-pdp11-aout/5/toolchain-install
> 
> No testsuite (yet? Maybe I'd add a bit), but re-checked some Hello
> World'ish code for no changes and it still runs on a SIMH pdp11.
> ...
> Okay for master?
> 
> MfG, JBG

Yes, thanks!

The test suite "compile" section isn't completely clean yet but it mostly 
works.  Execution is more problematic at this point.

paul



[PATCH 0/2] Ranger-based backwards threader implementation.

2021-06-28 Thread Aldy Hernandez via Gcc-patches
This is the ranger-based backwards threader.  It is divided into two
parts: the solver and the path discovery bits.

The solver is generic enough, that it may be of use to other passes,
so it's been abstracted into its own separate class/file.  Andrew and
I have already gone over it, so I don't think a review is necessary.
Besides, it's technically an extension of the ranger infrastructure.

On the other hand, the path discovery bits could benefit from the
watchful eye of the jump threading experts.

Documenting the solver in a [ranger-tech] post is on my TODO list,
as I think it would be useful as an example of GORI as a general
tool, outside the VRP world.

As I have mentioned elsewhere, I have gone through each test and
documented the reasons why they were adjusted (when useful).  The
reviewer(s) may benefit from looking at the test notes.

I have added a --param=threader-mode={ranger,legacy} option, which I
hope to remove shortly after.  It has been useful for diagnosing
issues in the past, though perhaps not so much now.  I've left it
in case there's a remote interest in using it during stage1, but
removing it could be a huge cleanup to tree-ssa-threadbackward.c.

If/when accepted, I will open 2-3 PRs with the XFAILed tests as
requested.  I am still working on distilling a C counterpart for
the libphobos missing thread edge.  It'll hopefully be ready by the
time the review is done.

A version of this patchset with the verification code has
been tested on x86-64, ppc64, ppc64le, and aarch64 (all Linux).

I am currently re-testing on x86-64 Linux, but will not re-test on the
rest of the architectures because...OMG aarch6 is so slow!

Thanks.
Aldy

Aldy Hernandez (2):
  Implement basic block path solver.
  Backwards jump threader rewrite with ranger.

 gcc/Makefile.in   |   6 +
 gcc/flag-types.h  |   7 +
 gcc/params.opt|  17 +
 .../g++.dg/debug/dwarf2/deallocator.C |   3 +-
 gcc/testsuite/gcc.c-torture/compile/pr83510.c |  33 ++
 gcc/testsuite/gcc.dg/Wrestrict-22.c   |   3 +
 gcc/testsuite/gcc.dg/loop-unswitch-2.c|   2 +-
 gcc/testsuite/gcc.dg/old-style-asm-1.c|   5 +-
 gcc/testsuite/gcc.dg/pr68317.c|   4 +-
 gcc/testsuite/gcc.dg/pr97567-2.c  |   2 +-
 gcc/testsuite/gcc.dg/predict-9.c  |   4 +-
 gcc/testsuite/gcc.dg/shrink-wrap-loop.c   |  53 ++
 gcc/testsuite/gcc.dg/sibcall-1.c  |  10 +
 .../gcc.dg/tree-ssa/builtin-sprintf-3.c   |   5 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr21001.c   |   1 +
 gcc/testsuite/gcc.dg/tree-ssa/pr21294.c   |   1 +
 gcc/testsuite/gcc.dg/tree-ssa/pr21417.c   |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr21458-2.c |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr21563.c   |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr49039.c   |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c |   2 +-
 .../gcc.dg/tree-ssa/ranger-threader-1.c   |  20 +
 .../gcc.dg/tree-ssa/ranger-threader-2.c   |  39 ++
 .../gcc.dg/tree-ssa/ranger-threader-3.c   |  41 ++
 .../gcc.dg/tree-ssa/ranger-threader-4.c   |  83 +++
 gcc/testsuite/gcc.dg/tree-ssa/split-path-4.c  |   4 +-
 .../gcc.dg/tree-ssa/ssa-dom-thread-11.c   |   2 +-
 .../gcc.dg/tree-ssa/ssa-dom-thread-12.c   |   2 +-
 .../gcc.dg/tree-ssa/ssa-dom-thread-14.c   |   1 +
 .../gcc.dg/tree-ssa/ssa-dom-thread-18.c   |   5 +-
 .../gcc.dg/tree-ssa/ssa-dom-thread-6.c|   4 +-
 .../gcc.dg/tree-ssa/ssa-dom-thread-7.c|   1 +
 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-48.c|   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-11.c |   1 +
 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-12.c |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c |   1 +
 gcc/testsuite/gcc.dg/tree-ssa/vrp02.c |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/vrp03.c |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/vrp05.c |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/vrp06.c |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/vrp07.c |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/vrp09.c |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/vrp19.c |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/vrp20.c |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/vrp33.c |   2 +-
 gcc/testsuite/gcc.dg/vect/bb-slp-16.c |   7 +
 .../gcc.target/i386/avx2-vect-aggressive.c|   2 +-
 gcc/tree-ssa-path-solver.cc   | 310 
 gcc/tree-ssa-path-solver.h|  85 
 gcc/tree-ssa-threadbackward.c | 475 +-
 gcc/tree-ssa-threadedge.c |  20 +-
 gcc/tree-ssa-threadedge.h |   3 +-
 gcc/tree-ssa-threadupdate.c   |  12 +-
 gcc/tree-ssa-threadupdate.h   |   2 +-
 .../libgomp.graphite/force-parallel-4.c   |   1 +
 .../lib

[PATCH 1/2] Implement basic block path solver.

2021-06-28 Thread Aldy Hernandez via Gcc-patches
This is is the main basic block path solver for use in the ranger-based
backwards threader.  Given a path of BBs, the class can solve the final
conditional or any SSA name used in calculating the final conditional.

The main API is:

// This class is a basic block path solver.  Given a set of BBs
// indicating a path through the CFG, range_in_path() will return the
// range of an SSA as if the BBs in the path would have been executed
// in order.
//
// Only SSA names passed in IMPORTS are precomputed, and can be
// queried.
//
// Note that the blocks are in reverse order, thus the exit block is
// path[0].

class path_solver
{
public:
  path_solver (gimple_ranger &ranger);
  virtual ~path_solver ();
  void precompute_ranges (const vec *path,
  const bitmap_head *imports);
  void range_in_path (irange &, tree name);
  void range_in_path (irange &, gimple *);
};

gcc/ChangeLog:

* Makefile.in (OBJS): Add tree-ssa-path-solver.o.
* tree-ssa-path-solver.cc: New file.
* tree-ssa-path-solver.h: New file.
---
 gcc/Makefile.in |   1 +
 gcc/tree-ssa-path-solver.cc | 310 
 gcc/tree-ssa-path-solver.h  |  85 ++
 3 files changed, 396 insertions(+)
 create mode 100644 gcc/tree-ssa-path-solver.cc
 create mode 100644 gcc/tree-ssa-path-solver.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index ebf26442992..66cc5f9529e 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1644,6 +1644,7 @@ OBJS = \
tree-ssa-loop.o \
tree-ssa-math-opts.o \
tree-ssa-operands.o \
+   tree-ssa-path-solver.o \
tree-ssa-phiopt.o \
tree-ssa-phiprop.o \
tree-ssa-pre.o \
diff --git a/gcc/tree-ssa-path-solver.cc b/gcc/tree-ssa-path-solver.cc
new file mode 100644
index 000..1e2c37cff78
--- /dev/null
+++ b/gcc/tree-ssa-path-solver.cc
@@ -0,0 +1,310 @@
+/* Basic block path solver.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   Contributed by Aldy Hernandez .
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+#include "gimple.h"
+#include "cfganal.h"
+#include "value-range.h"
+#include "gimple-range.h"
+#include "tree-pretty-print.h"
+#include "tree-ssa-path-solver.h"
+#include "ssa.h"
+
+// Internal construct to help facilitate debugging of solver.
+#define DEBUG_SOLVER getenv("DEBUG")
+
+path_solver::path_solver (gimple_ranger &ranger)
+  : m_ranger (ranger)
+{
+  m_cache = new ssa_global_cache;
+  m_has_cache_entry = BITMAP_ALLOC (NULL);
+  m_path = NULL;
+}
+
+path_solver::~path_solver ()
+{
+  BITMAP_FREE (m_has_cache_entry);
+  delete m_cache;
+}
+
+// Mark cache entry for NAME as unused.
+
+void
+path_solver::clear_cache (tree name)
+{
+  unsigned v = SSA_NAME_VERSION (name);
+  bitmap_clear_bit (m_has_cache_entry, v);
+}
+
+// If NAME has a cache entry, return it in R, and return TRUE.
+
+inline bool
+path_solver::get_cache (irange &r, tree name)
+{
+  if (!gimple_range_ssa_p (name))
+return get_global_range_query ()->range_of_expr (r, name);
+
+  unsigned v = SSA_NAME_VERSION (name);
+  if (bitmap_bit_p (m_has_cache_entry, v))
+return m_cache->get_global_range (r, name);
+
+  return false;
+}
+
+// Set the cache entry for NAME to R.
+
+void
+path_solver::set_cache (const irange &r, tree name)
+{
+  unsigned v = SSA_NAME_VERSION (name);
+  bitmap_set_bit (m_has_cache_entry, v);
+  m_cache->set_global_range (name, r);
+}
+
+bool
+path_solver::range_of_expr (irange &r, tree name, gimple *stmt)
+{
+  if (!irange::supports_type_p (TREE_TYPE (name)))
+return false;
+
+  if (get_cache (r, name))
+return true;
+
+  if (stmt && range_defined_in_block (r, name, gimple_bb (stmt)))
+{
+  set_cache (r, name);
+  return true;
+}
+
+  // Otherwise return varying.
+  r.set_varying (TREE_TYPE (name));
+  // ?? Is this set_cache necessary?
+  set_cache (r, name);
+  return true;
+}
+
+// Initialize the current path to PATH.  The current block is set to
+// the entry block to the path.
+//
+// Note that the blocks are in reverse order, so the exit block is
+// path[0].
+
+void
+path_solver::set_path (const vec *path)
+{
+  gcc_checking_assert (path->length () > 1);
+  m_path = path;
+  m_pos = m_path->length () - 1;
+  bitmap_clear (m_has_cache_entry);
+}
+
+// Return th

[PATCH 2/2] Backwards jump threader rewrite with ranger.

2021-06-28 Thread Aldy Hernandez via Gcc-patches
This is a rewrite of the backwards threader with a ranger based solver.

The code is divided into two parts: the path solver in
tree-ssa-path-solver.*, and the path discovery in
tree-ssa-threadbackward.c.

The legacy code is still available with --param=threader-mode=legacy,
but will be removed shortly after.

gcc/ChangeLog:

* Makefile.in (tree-ssa-loop-im.o-warn): New.
* flag-types.h (enum threader_mode): New.
* params.opt: Add entry for --param=threader-mode.
* tree-ssa-threadbackward.c (THREADER_ITERATIVE_MODE): New.
(class back_threader): New.
(back_threader::back_threader): New.
(back_threader::~back_threader): New.
(back_threader::maybe_register_path): New.
(back_threader::find_taken_edge): New.
(back_threader::find_taken_edge_switch): New.
(back_threader::find_taken_edge_cond): New.
(back_threader::resolve_def): New.
(back_threader::resolve_phi): New.
(back_threader::find_paths_to_names): New.
(back_threader::find_paths): New.
(dump_path): New.
(debug): New.
(thread_jumps::find_jump_threads_backwards): Call ranger threader.
(thread_jumps::find_jump_threads_backwards_with_ranger): New.
(pass_thread_jumps::execute): Abstract out code...
(try_thread_blocks): ...here.
* tree-ssa-threadedge.c (jump_threader::thread_outgoing_edges):
Abstract out threading candidate code to...
(single_succ_to_potentially_threadable_block): ...here.
* tree-ssa-threadedge.h (single_succ_to_potentially_threadable_block):
New.
* tree-ssa-threadupdate.c (register_jump_thread): Return boolean.
* tree-ssa-threadupdate.h (class jump_thread_path_registry):
Return bool from register_jump_thread.

libgomp/ChangeLog:

* testsuite/libgomp.graphite/force-parallel-4.c: Adjust for
threader.
* testsuite/libgomp.graphite/force-parallel-8.c: Same.

gcc/testsuite/ChangeLog:

* g++.dg/debug/dwarf2/deallocator.C: Adjust for threader.
* gcc.c-torture/compile/pr83510.c: Same.
* gcc.dg/Wrestrict-22.c: Same.
* gcc.dg/loop-unswitch-2.c: Same.
* gcc.dg/old-style-asm-1.c: Same.
* gcc.dg/pr68317.c: Same.
* gcc.dg/pr97567-2.c: Same.
* gcc.dg/predict-9.c: Same.
* gcc.dg/shrink-wrap-loop.c: Same.
* gcc.dg/sibcall-1.c: Same.
* gcc.dg/tree-ssa/builtin-sprintf-3.c: Same.
* gcc.dg/tree-ssa/pr21001.c: Same.
* gcc.dg/tree-ssa/pr21294.c: Same.
* gcc.dg/tree-ssa/pr21417.c: Same.
* gcc.dg/tree-ssa/pr21458-2.c: Same.
* gcc.dg/tree-ssa/pr21563.c: Same.
* gcc.dg/tree-ssa/pr49039.c: Same.
* gcc.dg/tree-ssa/pr61839_1.c: Same.
* gcc.dg/tree-ssa/pr61839_3.c: Same.
* gcc.dg/tree-ssa/pr77445-2.c: Same.
* gcc.dg/tree-ssa/split-path-4.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-11.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-12.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-14.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-18.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-6.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.
* gcc.dg/tree-ssa/ssa-fre-48.c: Same.
* gcc.dg/tree-ssa/ssa-thread-11.c: Same.
* gcc.dg/tree-ssa/ssa-thread-12.c: Same.
* gcc.dg/tree-ssa/ssa-thread-14.c: Same.
* gcc.dg/tree-ssa/vrp02.c: Same.
* gcc.dg/tree-ssa/vrp03.c: Same.
* gcc.dg/tree-ssa/vrp05.c: Same.
* gcc.dg/tree-ssa/vrp06.c: Same.
* gcc.dg/tree-ssa/vrp07.c: Same.
* gcc.dg/tree-ssa/vrp09.c: Same.
* gcc.dg/tree-ssa/vrp19.c: Same.
* gcc.dg/tree-ssa/vrp20.c: Same.
* gcc.dg/tree-ssa/vrp33.c: Same.
* gcc.dg/vect/bb-slp-16.c: Same.
* gcc.target/i386/avx2-vect-aggressive.c: Same.
* gcc.dg/tree-ssa/ranger-threader-1.c: New test.
* gcc.dg/tree-ssa/ranger-threader-2.c: New test.
* gcc.dg/tree-ssa/ranger-threader-3.c: New test.
* gcc.dg/tree-ssa/ranger-threader-4.c: New test.
---
 gcc/Makefile.in   |   5 +
 gcc/flag-types.h  |   7 +
 gcc/params.opt|  17 +
 .../g++.dg/debug/dwarf2/deallocator.C |   3 +-
 gcc/testsuite/gcc.c-torture/compile/pr83510.c |  33 ++
 gcc/testsuite/gcc.dg/Wrestrict-22.c   |   3 +
 gcc/testsuite/gcc.dg/loop-unswitch-2.c|   2 +-
 gcc/testsuite/gcc.dg/old-style-asm-1.c|   5 +-
 gcc/testsuite/gcc.dg/pr68317.c|   4 +-
 gcc/testsuite/gcc.dg/pr97567-2.c  |   2 +-
 gcc/testsuite/gcc.dg/predict-9.c  |   4 +-
 gcc/testsuite/gcc.dg/shrink-wrap-loop.c   |  53 ++
 gcc/testsuite/gcc.dg/sibcall-1.c  |  10 +
 .../gcc.dg/tree-ssa/builtin-sprintf-3.c   |   5 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr21001.c   |   1 +
 gcc/testsuit

libgomp.fortran/defaultmap-8.f90: Fix non-shared memory handling

2021-06-28 Thread Tobias Burnus

The following runs into the problem that the pointer
is privatized but not the pointer target (in the C
sense, i.e. it affects both allocatables and pointers
in Fortran). Thus, when running it with non-shared
memory offloading, the pointer shows to an invalid
address.

I think the fix is obvious (albeit unfortunate).

(Tested on x86-64 with -foffload=nvptx-none and
-foffload=disable.)

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
libgomp.fortran/defaultmap-8.f90: Fix non-shared memory handling

Disable some more parts of the test as firstprivate does not work yet
due to PR fortran/90742.

libgomp/
	* testsuite/libgomp.fortran/defaultmap-8.f90 (bar): Determine whether
	target has shared memory and disable some scalar pointer/allocatable
	checks if not as firstprivate does not work.

diff --git a/libgomp/testsuite/libgomp.fortran/defaultmap-8.f90 b/libgomp/testsuite/libgomp.fortran/defaultmap-8.f90
index ddf5057..54f4b2e 100644
--- a/libgomp/testsuite/libgomp.fortran/defaultmap-8.f90
+++ b/libgomp/testsuite/libgomp.fortran/defaultmap-8.f90
@@ -205,6 +205,7 @@ subroutine bar (ea1, ea2, ep1, ep2, eat1, eat2, et1, et2, ei1, ei2)
   pointer :: ep1, ep2, ep3
   target :: eat1, eat2, eat3, et1, et2, et3
   optional :: ea1, ep1, eat1, et1, ei1
+  logical :: shared_memory
 
   allocate(ea3, eat3, ep3)
 
@@ -212,19 +213,28 @@ subroutine bar (ea1, ea2, ep1, ep2, eat1, eat2, et1, et2, ei1, ei2)
   eat1 = 2; eat2 = 2; eat3 = 2; et1 = 2; et2 = 2; et3 = 2
   ei1 = 2; ei2 = 2; ei3 = 2
 
+  shared_memory = .false.
+  !$omp target map(to: shared_memory)
+shared_memory = .true.
+  !$omp end target
+
   ! While here 'scalar' implies nonallocatable/nonpointer and
   ! the target attribute plays no role.
   !$omp target defaultmap(tofrom:scalar) defaultmap(firstprivate:allocatable) &
-  !$omp&   defaultmap(none:aggregate) defaultmap(firstprivate:pointer)
-if (ea1 /= 2) stop 91
-if (ea2 /= 2) stop 92
-if (ea3 /= 2) stop 93
-if (ep1 /= 2) stop 94
-if (ep2 /= 2) stop 95
-if (ep3 /= 2) stop 96
-if (eat1 /= 2) stop 97
-if (eat2 /= 2) stop 98
-if (eat3 /= 2) stop 99
+  !$omp&   defaultmap(none:aggregate) defaultmap(firstprivate:pointer) &
+  !$omp&   map(always, to: shared_memory)
+if (shared_memory) then
+  ! Due to fortran/90742 this fails when doing non-shared memory offloading
+  if (ea1 /= 2) stop 91
+  if (ea2 /= 2) stop 92
+  if (ea3 /= 2) stop 93
+  if (ep1 /= 2) stop 94
+  if (ep2 /= 2) stop 95
+  if (ep3 /= 2) stop 96
+  if (eat1 /= 2) stop 97
+  if (eat2 /= 2) stop 98
+  if (eat3 /= 2) stop 99
+end if
 if (et1 /= 2) stop 100
 if (et2 /= 2) stop 101
 if (et3 /= 2) stop 102
@@ -232,8 +242,11 @@ subroutine bar (ea1, ea2, ep1, ep2, eat1, eat2, et1, et2, ei1, ei2)
 if (ei2 /= 2) stop 104
 if (ei3 /= 2) stop 105
 ep1 => null(); ep2 => null(); ep3 => null()
-ea1 = 1; ea2 = 1; ea3 = 1
-eat1 = 1; eat2 = 1; eat3 = 1
+if (shared_memory) then
+  ! Due to fortran/90742 this fails when doing non-shared memory offloading
+  ea1 = 1; ea2 = 1; ea3 = 1
+  eat1 = 1; eat2 = 1; eat3 = 1
+end if
 et1 = 1; et2 = 1; et3 = 1
 ei1 = 1; ei2 = 1; ei3 = 1
   !$omp end target


[COMMITTED V10 1/7] dwarf: externalize some DWARF internals for needs of CTF/BTF

2021-06-28 Thread Jose E. Marchesi via Gcc-patches
This patch externalizes some internal DIE structures and their attributes
for the use of DWARF-based debug formats like CTF and BTF.

The following functions which were previously defined as static in
dwarf2out.c are now non-static, and extern prototypes for them have
been added to dwarf2out.h:

- get_AT
- AT_int
- AT_class
- AT_loc
- get_AT_ref
- get_AT_string
- get_AT_class
- AT_unsigned
- get_AT_unsigned
- get_AT_flag
- add_name_attribute
- new_die_raw
- base_type_die
- lookup_decl_die
- get_AT_file

Note how this patch doens't change the names of these functions to
avoid a massive renaming in dwarf2out.c, but in the future we probably
want these functions to sport a dw_* prefix.

Also, some type definitions have been moved from dwarf2out.c to
dwarf2out.h:

- dw_attr_node
- struct dwarf_file_data

Finally, three new accessor functions have been added to dwarf2out.c
with prototypes in dwarf2out.h:

- dw_get_die_child
- dw_get_die_sib
- dw_get_die_tag

2021-06-28  Jose E. Marchesi  

* dwarf2out.c (AT_class): Function is no longer static.
(AT_int): Likewise.
(AT_unsigned): Likewise.
(AT_loc): Likewise.
(get_AT): Likewise.
(get_AT_string): Likewise.
(get_AT_flag): Likewise.
(get_AT_unsigned): Likewise.
(get_AT_ref): Likewise.
(new_die_raw): Likewise.
(lookup_decl_die): Likewise.
(base_type_die): Likewise.
(add_name_attribute): Likewise.
(add_AT_int): Likewise.
(add_AT_unsigned): Likewise.
(add_AT_loc): Likewise.
(dw_get_die_tag): New function.
(dw_get_die_child): Likewise.
(dw_get_die_sib): Likewise.
(struct dwarf_file_data): Move from here to dwarf2out.h
(struct dw_attr_struct): Likewise.
* dwarf2out.h: Analogous changes.
---
 gcc/dwarf2out.c | 81 +++--
 gcc/dwarf2out.h | 44 +++
 2 files changed, 82 insertions(+), 43 deletions(-)

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 9a91981acb0..e10006c5081 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -1332,12 +1332,6 @@ dwarf2out_switch_text_section (void)
 /* And now, the subset of the debugging information support code necessary
for emitting location expressions.  */
 
-/* Data about a single source file.  */
-struct GTY((for_user)) dwarf_file_data {
-  const char * filename;
-  int emitted_number;
-};
-
 /* Describe an entry into the .debug_addr section.  */
 
 enum ate_kind {
@@ -3123,17 +3117,6 @@ maybe_reset_location_view (rtx_insn *insn, 
dw_line_info_table *table)
 RESET_NEXT_VIEW (table->view);
 }
 
-/* Each DIE attribute has a field specifying the attribute kind,
-   a link to the next attribute in the chain, and an attribute value.
-   Attributes are typically linked below the DIE they modify.  */
-
-typedef struct GTY(()) dw_attr_struct {
-  enum dwarf_attribute dw_attr;
-  dw_val_node dw_attr_val;
-}
-dw_attr_node;
-
-
 /* The Debugging Information Entry (DIE) structure.  DIEs form a tree.
The children of each node form a circular list linked by
die_sib.  die_child points to the node *before* the "first" child node.  */
@@ -3711,14 +3694,11 @@ static const char *dwarf_form_name (unsigned);
 static tree decl_ultimate_origin (const_tree);
 static tree decl_class_context (tree);
 static void add_dwarf_attr (dw_die_ref, dw_attr_node *);
-static inline enum dw_val_class AT_class (dw_attr_node *);
 static inline unsigned int AT_index (dw_attr_node *);
 static void add_AT_flag (dw_die_ref, enum dwarf_attribute, unsigned);
 static inline unsigned AT_flag (dw_attr_node *);
 static void add_AT_int (dw_die_ref, enum dwarf_attribute, HOST_WIDE_INT);
-static inline HOST_WIDE_INT AT_int (dw_attr_node *);
 static void add_AT_unsigned (dw_die_ref, enum dwarf_attribute, unsigned 
HOST_WIDE_INT);
-static inline unsigned HOST_WIDE_INT AT_unsigned (dw_attr_node *);
 static void add_AT_double (dw_die_ref, enum dwarf_attribute,
   HOST_WIDE_INT, unsigned HOST_WIDE_INT);
 static inline void add_AT_vec (dw_die_ref, enum dwarf_attribute, unsigned int,
@@ -3733,7 +3713,6 @@ static inline dw_die_ref AT_ref (dw_attr_node *);
 static inline int AT_ref_external (dw_attr_node *);
 static inline void set_AT_ref_external (dw_attr_node *, int);
 static void add_AT_loc (dw_die_ref, enum dwarf_attribute, dw_loc_descr_ref);
-static inline dw_loc_descr_ref AT_loc (dw_attr_node *);
 static void add_AT_loc_list (dw_die_ref, enum dwarf_attribute,
 dw_loc_list_ref);
 static inline dw_loc_list_ref AT_loc_list (dw_attr_node *);
@@ -3750,12 +3729,7 @@ static void add_AT_macptr (dw_die_ref, enum 
dwarf_attribute, const char *);
 static void add_AT_range_list (dw_die_ref, enum dwarf_attribute,
unsigned long, bool);
 static inline const char *AT_lbl (dw_attr_node *);
-static dw_attr_node *get_AT (dw_die_ref, enum dwarf_attribute);
 static con

[COMMITTED V10 2/7] dejagnu: modularize gcc-dg-debug-runtest a bit

2021-06-28 Thread Jose E. Marchesi via Gcc-patches
Move some functionality into a procedure of its own. This is only so that when
the patch for ctf comes along, the gcc-dg-debug-runtest procedure looks bit
more uniform.

gcc/testsuite/ChangeLog:

* lib/gcc-dg.exp (gcc-dg-target-supports-debug-format): New procedure.
---
 gcc/testsuite/lib/gcc-dg.exp | 23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
index fce0989cd9c..c7722ba07da 100644
--- a/gcc/testsuite/lib/gcc-dg.exp
+++ b/gcc/testsuite/lib/gcc-dg.exp
@@ -621,18 +621,27 @@ proc gcc-dg-runtest { testcases flags default-extra-flags 
} {
 }
 }
 
-proc gcc-dg-debug-runtest { target_compile trivial opt_opts testcases } {
+# Check if the target system supports the debug format
+proc gcc-dg-target-supports-debug-format { target_compile trivial type } {
 global srcdir subdir
 
+set comp_output [$target_compile \
+   "$srcdir/$subdir/$trivial" "trivial.S" assembly \
+   "additional_flags=$type"]
+if { ! [string match "*: target system does not support the * debug 
format*" \
+   $comp_output] } {
+   remove-build-file "trivial.S"
+   return 1
+}
+return 0
+}
+
+proc gcc-dg-debug-runtest { target_compile trivial opt_opts testcases } {
 if ![info exists DEBUG_TORTURE_OPTIONS] {
set DEBUG_TORTURE_OPTIONS ""
foreach type {-gdwarf-2 -gstabs -gstabs+ -gxcoff -gxcoff+} {
-   set comp_output [$target_compile \
-   "$srcdir/$subdir/$trivial" "trivial.S" assembly \
-   "additional_flags=$type"]
-   if { ! [string match "*: target system does not support the * debug 
format*" \
-   $comp_output] } {
-   remove-build-file "trivial.S"
+   if [expr [gcc-dg-target-supports-debug-format \
+ $target_compile $trivial $type]] {
foreach level {1 "" 3} {
if { ($type == "-gdwarf-2") && ($level != "") } {
lappend DEBUG_TORTURE_OPTIONS [list "${type}" 
"-g${level}"]
-- 
2.25.0.2.g232378479e



[COMMITTED V10 0/7] Support for the CTF and BTF debug formats

2021-06-28 Thread Jose E. Marchesi via Gcc-patches
[Changes from V9:

 All the patches have been OKed, provided a few things were fixed
 before pushing.  These points, raised by Richard Biener and Jason
 Merrill, have been all addressed as part of the following changes:

 - No dwarf2int.h header is introduced anymore in the patch
   series.  Instead, we are exporting the needed interface in the
   existing dwarf2out.h.  We intend to do some refactoring of
   dwarf2out.[ch] in the near future.
 - Make it explicit in the manual that different debug formats can
   coexist.
 - Add code comment in dwarf2out_source_line.
 - Add missing function level comment for btf_debuginfo_p.
 - Use uint32_t for a couple of variables that used unsigned int
   before.
 - Use XNEWVEC instead of xmalloc.
 - Adhere to 80 chars line length.
 - A few other cosmetic fixes for upstreaming.

 Thanks a lot to the reviewers!]

Hi people!

Last year we submitted a first patch series introducing support for
the CTF debugging format in GCC [1].  We got a lot of feedback that
prompted us to change the approach used to generate the debug info,
and this patch series is the result of that.

This series also add support for the BTF debug format, which is needed
by the BPF backend (more on this below.)

This implementation works, but there are several points that need
discussion and agreement with the upstream community, as they impact
the way debugging options work.  We are also proposing a way to add
additional debugging formats in the future.  See below for more
details.

Finally, a patch makes the BPF GCC backend to use the DWARF debug
hooks in order to make -gbtf available to it.

[1] https://gcc.gnu.org/legacy-ml/gcc-patches/2019-05/msg01297.html

About CTF
=

CTF is a debugging format designed in order to express C types in a
very compact way.  The key is compactness and simplicity.  For more
information see:

- CTF specification
  http://www.esperi.org.uk/~oranix/ctf/ctf-spec.pdf

- Compact C-Type support in the GNU toolchain (talk + slides)
  https://linuxplumbersconf.org/event/4/contributions/396/

- On type de-duplication in CTF (talk + slides)
  https://linuxplumbersconf.org/event/7/contributions/725/

About BTF
=

BTF is a debugging format, similar to CTF, that is used in the Linux
kernel as the debugging format for BPF programs.  From the kernel
documentation:

"BTF (BPF Type Format) is the metadata format which encodes the debug
 info related to BPF program/map. The name BTF was used initially to
 describe data types. The BTF was later extended to include function
 info for defined subroutines, and line info for source/line
 information."

Supporting BTF in GCC is important because compiled BPF programs
(which GCC supports as a target) require the type information in order
to be loaded and run in diverse kernel versions.  This mechanism is
known as CO-RE (compile-once, run-everywhere) and is described in the
"Update of the BPF support in the GNU Toolchain" talk mentioned below.

The BTF is documented in the Linux kernel documentation tree:
- linux/Documentation/bpf/btf.rst

CTF in the GNU Toolchain


During the last year we have been working in adding support for CTF to
several components of the GNU toolchain:

- binutils support is already upstream.  It supports linking objects
  with CTF information with full type de-duplication.

- GDB support is to be sent upstream very shortly.  It makes the
  debugger capable to use the CTF information whenever available.
  This is useful in cases where DWARF has been stripped out but CTF is
  kept.

- GCC support is being discussed and submitted in this series.

Overview of the Implementation
==

  dwarf2out.c

The enabled debug formats are hooked in dwarf2out_early_finish.

  dwarf2int.h

Internal interface that exports a few functions and data types
defined in dwarf2out.c.

  dwarf2ctf.c

Code that tranform the internal GCC DWARF DIEs into CTF container
structures.  This file uses the dwarf2int.h interface.

  ctfc.c
  ctfc.h

These two files implement the "CTF container", which is shared
among CTF and BTF, due to the many similarities between both
formats.

  ctfout.c

Code that emits assembler with the .ctf section data, from the CTF
container.

  btfout.c

Code that emits assembler with the .BTF section data, from the CTF
container.

>From debug hooks to debug formats
=

Our first attempt in adding CTF to GCC used the obvious approach of
adding a new set of debug hooks as defined in gcc/debug.h.

During our first interaction with the upstream community we were told
to _not_ use debug hooks, because these are to be obsoleted at some
point.  We were suggested to instead hook our handlers (which
processed type TREE nodes producing CTF types from them) somewhere
else.  So we did.

However at the time we were also facing the need to support BTF, which
is another type-related debug format needed 

[COMMITTED V10 5/7] CTF/BTF documentation

2021-06-28 Thread Jose E. Marchesi via Gcc-patches
This commit documents the new command line options introduced by the
CTF and BTF debug formats.

2021-06-28  Indu Bhagat  

* doc/invoke.texi: Document the CTF and BTF debug info options.
---
 gcc/doc/invoke.texi | 32 +++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index af2ce189fae..2dc6a2106d9 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -466,6 +466,7 @@ Objective-C and Objective-C++ Dialects}.
 @item Debugging Options
 @xref{Debugging Options,,Options for Debugging Your Program}.
 @gccoptlist{-g  -g@var{level}  -gdwarf  -gdwarf-@var{version} @gol
+-gbtf -gctf  -gctf@var{level} @gol
 -ggdb  -grecord-gcc-switches  -gno-record-gcc-switches @gol
 -gstabs  -gstabs+  -gstrict-dwarf  -gno-strict-dwarf @gol
 -gas-loc-support  -gno-as-loc-support @gol
@@ -9647,7 +9648,9 @@ in the ``exploded graph'' and diagnostics associated with 
them.
 @cindex debugging information options
 
 To tell GCC to emit extra information for use by a debugger, in almost 
-all cases you need only to add @option{-g} to your other options.
+all cases you need only to add @option{-g} to your other options.  Some debug
+formats can co-exist (like DWARF with CTF) when each of them is enabled
+explicitly by adding the respective command line option to your other options.
 
 GCC allows you to use @option{-g} with
 @option{-O}.  The shortcuts taken by optimized code may occasionally
@@ -9708,6 +9711,33 @@ other DWARF-related options such as
 @option{-fno-dwarf2-cfi-asm}) retain a reference to DWARF Version 2
 in their names, but apply to all currently-supported versions of DWARF.
 
+@item -gbtf
+@opindex gbtf
+Request BTF debug information.  BTF is the default debugging format for the
+eBPF target.  On other targets, like x86, BTF debug information can be
+generated along with DWARF debug information when both of the debug formats are
+enabled explicitly via their respective command line options.
+
+@item -gctf
+@itemx -gctf@var{level}
+@opindex gctf
+Request CTF debug information and use level to specify how much CTF debug
+information should be produced.  If @option{-gctf} is specified
+without a value for level, the default level of CTF debug information is 2.
+
+CTF debug information can be generated along with DWARF debug information when
+both of the debug formats are enabled explicitly via their respective command
+line options.
+
+Level 0 produces no CTF debug information at all.  Thus, @option{-gctf0}
+negates @option{-gctf}.
+
+Level 1 produces CTF information for tracebacks only.  This includes callsite
+information, but does not include type information.
+
+Level 2 produces type information for entities (functions, data objects etc.)
+at file-scope or global-scope only.
+
 @item -gstabs
 @opindex gstabs
 Produce debugging information in stabs format (if that is supported),
-- 
2.25.0.2.g232378479e



[COMMITTED V10 7/7] libiberty: copy over .BTF section when using LTO

2021-06-28 Thread Jose E. Marchesi via Gcc-patches
libiberty/ChangeLog:

* simple-object.c (handle_lto_debug_sections): Copy over .BTF section.
---
 libiberty/simple-object.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/libiberty/simple-object.c b/libiberty/simple-object.c
index 909995dd166..facbf94fd09 100644
--- a/libiberty/simple-object.c
+++ b/libiberty/simple-object.c
@@ -307,6 +307,9 @@ handle_lto_debug_sections (const char *name, int rename)
   /* Copy over .ctf section under the same name if present.  */
   else if (strcmp (name, ".ctf") == 0)
 return strcpy (newname, name);
+  /* Copy over .BTF section under the same name if present.  */
+  else if (strcmp (name, ".BTF") == 0)
+return strcpy (newname, name);
   free (newname);
   return NULL;
 }
-- 
2.25.0.2.g232378479e



[COMMITTED V10 6/7] Enable BTF generation in the BPF backend

2021-06-28 Thread Jose E. Marchesi via Gcc-patches
This patch changes the BPF GCC backend in order to use the DWARF debug
hooks and therefore enables the user to generate BTF debugging
information with -gbtf.  Generating BTF is crucial when compiling BPF
programs, since the CO-RE (compile-once, run-everwhere) mechanism
used by the kernel BPF loader relies on it.

Note that since in eBPF it is not possible to unwind frames due to the
restrictive nature of the target architecture, we are disabling the
generation of CFA in this target.

2021-06-28  David Faust 

* config/bpf/bpf.c (bpf_expand_prologue): Do not mark insns as
frame related.
(bpf_expand_epilogue): Likewise.
* config/bpf/bpf.h (DWARF2_FRAME_INFO): Define to 0.
Do not define DBX_DEBUGGING_INFO.
---
 gcc/config/bpf/bpf.c |  4 
 gcc/config/bpf/bpf.h | 12 ++--
 2 files changed, 2 insertions(+), 14 deletions(-)

diff --git a/gcc/config/bpf/bpf.c b/gcc/config/bpf/bpf.c
index 126d4a2798d..e635f9edb40 100644
--- a/gcc/config/bpf/bpf.c
+++ b/gcc/config/bpf/bpf.c
@@ -349,7 +349,6 @@ bpf_expand_prologue (void)
  hard_frame_pointer_rtx,
  fp_offset - 8));
  insn = emit_move_insn (mem, gen_rtx_REG (DImode, regno));
- RTX_FRAME_RELATED_P (insn) = 1;
  fp_offset -= 8;
}
}
@@ -364,7 +363,6 @@ bpf_expand_prologue (void)
 {
   insn = emit_move_insn (stack_pointer_rtx,
 hard_frame_pointer_rtx);
-  RTX_FRAME_RELATED_P (insn) = 1;
 
   if (size > 0)
{
@@ -372,7 +370,6 @@ bpf_expand_prologue (void)
 gen_rtx_PLUS (Pmode,
   stack_pointer_rtx,
   GEN_INT (-size;
- RTX_FRAME_RELATED_P (insn) = 1;
}
 }
 }
@@ -412,7 +409,6 @@ bpf_expand_epilogue (void)
  hard_frame_pointer_rtx,
  fp_offset - 8));
  insn = emit_move_insn (gen_rtx_REG (DImode, regno), mem);
- RTX_FRAME_RELATED_P (insn) = 1;
  fp_offset -= 8;
}
}
diff --git a/gcc/config/bpf/bpf.h b/gcc/config/bpf/bpf.h
index 80195cea5b2..82be0c3e190 100644
--- a/gcc/config/bpf/bpf.h
+++ b/gcc/config/bpf/bpf.h
@@ -235,17 +235,9 @@ enum reg_class
 
 / Debugging Info /
 
-/* We cannot support DWARF2 because of the limitations of eBPF.  */
+/* In eBPF it is not possible to unwind frames. Disable CFA.  */
 
-/* elfos.h insists in using DWARF.  Undo that here.  */
-#ifdef DWARF2_DEBUGGING_INFO
-# undef DWARF2_DEBUGGING_INFO
-#endif
-#ifdef PREFERRED_DEBUGGING_TYPE
-# undef PREFERRED_DEBUGGING_TYPE
-#endif
-
-#define DBX_DEBUGGING_INFO
+#define DWARF2_FRAME_INFO 0
 
 / Stack Layout and Calling Conventions.  */
 
-- 
2.25.0.2.g232378479e



[COMMITTED V10 4/7] CTF/BTF testsuites

2021-06-28 Thread Jose E. Marchesi via Gcc-patches
This commit adds a new testsuite for the CTF debug format.

2021-06-28  Indu Bhagat  
David Faust  

gcc/testsuite/

* lib/gcc-dg.exp (gcc-dg-frontend-supports-ctf): New procedure.
(gcc-dg-debug-runtest): Add -gctf support.
* gcc.dg/debug/btf/btf-1.c: New test.
* gcc.dg/debug/btf/btf-2.c: Likewise.
* gcc.dg/debug/btf/btf-anonymous-struct-1.c: Likewise.
* gcc.dg/debug/btf/btf-anonymous-union-1.c: Likewise.
* gcc.dg/debug/btf/btf-array-1.c: Likewise.
* gcc.dg/debug/btf/btf-bitfields-1.c: Likewise.
* gcc.dg/debug/btf/btf-bitfields-2.c: Likewise.
* gcc.dg/debug/btf/btf-bitfields-3.c: Likewise.
* gcc.dg/debug/btf/btf-cvr-quals-1.c: Likewise.
* gcc.dg/debug/btf/btf-enum-1.c: Likewise.
* gcc.dg/debug/btf/btf-forward-1.c: Likewise.
* gcc.dg/debug/btf/btf-function-1.c: Likewise.
* gcc.dg/debug/btf/btf-function-2.c: Likewise.
* gcc.dg/debug/btf/btf-int-1.c: Likewise.
* gcc.dg/debug/btf/btf-pointers-1.c: Likewise.
* gcc.dg/debug/btf/btf-struct-1.c: Likewise.
* gcc.dg/debug/btf/btf-typedef-1.c: Likewise.
* gcc.dg/debug/btf/btf-union-1.c: Likewise.
* gcc.dg/debug/btf/btf-variables-1.c: Likewise.
* gcc.dg/debug/btf/btf.exp: Likewise.
* gcc.dg/debug/ctf/ctf-1.c: Likewise.
* gcc.dg/debug/ctf/ctf-2.c: Likewise.
* gcc.dg/debug/ctf/ctf-anonymous-struct-1.c: Likewise.
* gcc.dg/debug/ctf/ctf-anonymous-union-1.c: Likewise.
* gcc.dg/debug/ctf/ctf-array-1.c: Likewise.
* gcc.dg/debug/ctf/ctf-array-2.c: Likewise.
* gcc.dg/debug/ctf/ctf-array-3.c: Likewise.
* gcc.dg/debug/ctf/ctf-array-4.c: Likewise.
* gcc.dg/debug/ctf/ctf-attr-mode-1.c: Likewise.
* gcc.dg/debug/ctf/ctf-attr-used-1.c: Likewise.
* gcc.dg/debug/ctf/ctf-bitfields-1.c: Likewise.
* gcc.dg/debug/ctf/ctf-bitfields-2.c: Likewise.
* gcc.dg/debug/ctf/ctf-bitfields-3.c: Likewise.
* gcc.dg/debug/ctf/ctf-bitfields-4.c: Likewise.
* gcc.dg/debug/ctf/ctf-complex-1.c: Likewise.
* gcc.dg/debug/ctf/ctf-cvr-quals-1.c: Likewise.
* gcc.dg/debug/ctf/ctf-cvr-quals-2.c: Likewise.
* gcc.dg/debug/ctf/ctf-cvr-quals-3.c: Likewise.
* gcc.dg/debug/ctf/ctf-cvr-quals-4.c: Likewise.
* gcc.dg/debug/ctf/ctf-enum-1.c: Likewise.
* gcc.dg/debug/ctf/ctf-enum-2.c: Likewise.
* gcc.dg/debug/ctf/ctf-file-scope-1.c: Likewise.
* gcc.dg/debug/ctf/ctf-float-1.c: Likewise.
* gcc.dg/debug/ctf/ctf-forward-1.c: Likewise.
* gcc.dg/debug/ctf/ctf-forward-2.c: Likewise.
* gcc.dg/debug/ctf/ctf-func-index-1.c: Likewise.
* gcc.dg/debug/ctf/ctf-function-pointers-1.c: Likewise.
* gcc.dg/debug/ctf/ctf-function-pointers-2.c: Likewise.
* gcc.dg/debug/ctf/ctf-function-pointers-3.c: Likewise.
* gcc.dg/debug/ctf/ctf-functions-1.c: Likewise.
* gcc.dg/debug/ctf/ctf-int-1.c: Likewise.
* gcc.dg/debug/ctf/ctf-objt-index-1.c: Likewise.
* gcc.dg/debug/ctf/ctf-pointers-1.c: Likewise.
* gcc.dg/debug/ctf/ctf-pointers-2.c: Likewise.
* gcc.dg/debug/ctf/ctf-preamble-1.c: Likewise.
* gcc.dg/debug/ctf/ctf-skip-types-1.c: Likewise.
* gcc.dg/debug/ctf/ctf-skip-types-2.c: Likewise.
* gcc.dg/debug/ctf/ctf-skip-types-3.c: Likewise.
* gcc.dg/debug/ctf/ctf-skip-types-4.c: Likewise.
* gcc.dg/debug/ctf/ctf-skip-types-5.c: Likewise.
* gcc.dg/debug/ctf/ctf-skip-types-6.c: Likewise.
* gcc.dg/debug/ctf/ctf-str-table-1.c: Likewise.
* gcc.dg/debug/ctf/ctf-struct-1.c: Likewise.
* gcc.dg/debug/ctf/ctf-struct-2.c: Likewise.
* gcc.dg/debug/ctf/ctf-struct-array-1.c: Likewise.
* gcc.dg/debug/ctf/ctf-struct-pointer-1.c: Likewise.
* gcc.dg/debug/ctf/ctf-struct-pointer-2.c: Likewise.
* gcc.dg/debug/ctf/ctf-typedef-1.c: Likewise.
* gcc.dg/debug/ctf/ctf-typedef-2.c: Likewise.
* gcc.dg/debug/ctf/ctf-typedef-3.c: Likewise.
* gcc.dg/debug/ctf/ctf-typedef-struct-1.c: Likewise.
* gcc.dg/debug/ctf/ctf-typedef-struct-2.c: Likewise.
* gcc.dg/debug/ctf/ctf-typedef-struct-3.c: Likewise.
* gcc.dg/debug/ctf/ctf-union-1.c: Likewise.
* gcc.dg/debug/ctf/ctf-variables-1.c: Likewise.
* gcc.dg/debug/ctf/ctf-variables-2.c: Likewise.
* gcc.dg/debug/ctf/ctf.exp: Likewise.
---
 gcc/testsuite/gcc.dg/debug/btf/btf-1.c|  6 ++
 gcc/testsuite/gcc.dg/debug/btf/btf-2.c| 10 +++
 .../gcc.dg/debug/btf/btf-anonymous-struct-1.c | 23 ++
 .../gcc.dg/debug/btf/btf-anonymous-union-1.c  | 23 ++
 gcc/testsuite/gcc.dg/debug/btf/btf-array-1.c  | 31 +++
 .../gcc.dg/debug/btf/btf-bitfields-1.c| 34 
 .../gcc.dg/debug/btf/btf-bitfields-2.c| 26 ++
 .../gcc.dg/debug/btf/btf-bitfields-3.c| 43 ++
 .../gcc.dg/debug/btf/

Re: [PATCH] libbacktrace: fix DWARF suppport for XCOFF files

2021-06-28 Thread Ian Lance Taylor via Gcc-patches
On Mon, Jun 28, 2021 at 12:27 AM CHIGOT, CLEMENT
 wrote:
>
> A few things were missing to correctly handle DWARF files on AIX.
>
> Moreover, the previously base_addres was the starting address of
> the .text section of a loaded file instead of the difference
> between this starting address and the starting address in
> the file itself (unloaded).
>
> libbacktrace/ChangeLog:
> 2021-06-28  Clément Chigot  
>
> * xcoff.c (SSUBTYP_DWRNGES): New define.
> (xcoff_add): Use correct XCOFF DWARF section subtype
> for DEBUG_RANGES. Remove lineoff workaround.
> Adjust base_address.
> (xcoff_initialize_syminfo): Adapt to new base_address.
> (xcoff_lookup_pc): Likewise.
> (xcoff_initialize_fileline): Likewise.

Thanks.  Committed as follows.

Ian
commit b5261d823243111ab733b2da25c50f361fe5de3f
Author: Ian Lance Taylor 
Date:   Mon Jun 28 10:34:58 2021 -0700

libbacktrace: improve XCOFF support

libbacktrace/ChangeLog:
2021-06-28  Clément Chigot  

* xcoff.c (SSUBTYP_DWRNGES): New define.
(xcoff_add): Use correct XCOFF DWARF section subtype
for DEBUG_RANGES. Remove lineoff workaround.
Adjust base_address.
(xcoff_initialize_syminfo): Adapt to new base_address.
(xcoff_lookup_pc): Likewise.
(xcoff_initialize_fileline): Likewise.

diff --git a/libbacktrace/xcoff.c b/libbacktrace/xcoff.c
index 1e65c00553c..2ded8f0024f 100644
--- a/libbacktrace/xcoff.c
+++ b/libbacktrace/xcoff.c
@@ -133,6 +133,7 @@ typedef struct {
 #define SSUBTYP_DWARNGE0x5 /* DWARF aranges section.  */
 #define SSUBTYP_DWABREV0x6 /* DWARF abbreviation section.  */
 #define SSUBTYP_DWSTR  0x7 /* DWARF strings section.  */
+#define SSUBTYP_DWRNGES0x8 /* DWARF ranges section.  */
 
 /* XCOFF symbol.  */
 
@@ -586,7 +587,6 @@ xcoff_symname (const b_xcoff_syment *asym,
 static int
 xcoff_initialize_syminfo (struct backtrace_state *state,
  uintptr_t base_address,
- const b_xcoff_scnhdr *sects,
  const b_xcoff_syment *syms, size_t nsyms,
  const unsigned char *strtab, size_t strtab_size,
  backtrace_error_callback error_callback, void *data,
@@ -628,8 +628,7 @@ xcoff_initialize_syminfo (struct backtrace_state *state,
{
  const b_xcoff_auxent *aux = (const b_xcoff_auxent *) (asym + 1);
  xcoff_symbols[j].name = xcoff_symname (asym, strtab, strtab_size);
- xcoff_symbols[j].address = base_address + asym->n_value
-  - sects[asym->n_scnum - 1].s_paddr;
+ xcoff_symbols[j].address = base_address + asym->n_value;
  /* x_fsize will be 0 if there is no debug information.  */
  xcoff_symbols[j].size = aux->x_fcn.x_fsize;
  ++j;
@@ -767,7 +766,7 @@ xcoff_lookup_pc (struct backtrace_state *state 
ATTRIBUTE_UNUSED,
   lineno = (const b_xcoff_lineno *) lineptr;
   if (lineno->l_lnno == 0)
break;
-  if (pc <= fdata->base_address + lineno->l_addr.l_paddr - fn->sect_base)
+  if (pc <= fdata->base_address + lineno->l_addr.l_paddr)
break;
   match = lnnoptr;
   lnno = lineno->l_lnno;
@@ -1002,7 +1001,7 @@ xcoff_initialize_fileline (struct backtrace_state *state,
fn->name = xcoff_symname (fsym, strtab, strtab_size);
fn->filename = filename;
fn->sect_base = sects[fsym->n_scnum - 1].s_paddr;
-   fn->pc = base_address + fsym->n_value - fn->sect_base;
+   fn->pc = base_address + fsym->n_value;
fn->size = fsize;
fn->lnno = lnno;
fn->lnnoptr = lnnoptr;
@@ -1153,8 +1152,16 @@ xcoff_add (struct backtrace_state *state, int 
descriptor, off_t offset,
 
   stext = §s[i];
 
-  /* AIX ldinfo_textorg includes the XCOFF headers.  */
-  base_address = (exe ? XCOFF_AIX_TEXTBASE : base_address) + stext->s_scnptr;
+  /* base_address represents the difference between the
+ virtual memory address of the shared object or a loaded
+ executable and the offset of that object in the file
+ from which it was loaded.
+ On AIX, virtual address is either fixed for executable
+ or given by ldinfo.  This address will include the XCOFF
+ headers.  */
+  base_address = ((exe ? XCOFF_AIX_TEXTBASE : base_address)
+ + stext->s_scnptr
+ - stext->s_paddr);
 
   lnnoptr = stext->s_lnnoptr;
   nlnno = stext->s_nlnno;
@@ -1212,7 +1219,7 @@ xcoff_add (struct backtrace_state *state, int descriptor, 
off_t offset,
   if (sdata == NULL)
goto fail;
 
-  if (!xcoff_initialize_syminfo (state, base_address, sects,
+  if (!xcoff_initialize_syminfo (state, base_address,
 syms_view.data, fhdr.f_nsyms,
 str_view.data, str_size,
 

Re: [PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-06-28 Thread Martin Sebor via Gcc-patches

On 6/28/21 2:07 AM, Richard Biener wrote:

On Sat, Jun 26, 2021 at 12:36 AM Martin Sebor  wrote:


On 6/25/21 4:11 PM, Jason Merrill wrote:

On 6/25/21 4:51 PM, Martin Sebor wrote:

On 6/1/21 3:38 PM, Jason Merrill wrote:

On 6/1/21 3:56 PM, Martin Sebor wrote:

On 5/27/21 2:53 PM, Jason Merrill wrote:

On 4/27/21 11:52 AM, Martin Sebor via Gcc-patches wrote:

On 4/27/21 8:04 AM, Richard Biener wrote:

On Tue, Apr 27, 2021 at 3:59 PM Martin Sebor 
wrote:


On 4/27/21 1:58 AM, Richard Biener wrote:

On Tue, Apr 27, 2021 at 2:46 AM Martin Sebor via Gcc-patches
 wrote:


PR 90904 notes that auto_vec is unsafe to copy and assign because
the class manages its own memory but doesn't define (or delete)
either special function.  Since I first ran into the problem,
auto_vec has grown a move ctor and move assignment from
a dynamically-allocated vec but still no copy ctor or copy
assignment operator.

The attached patch adds the two special functions to auto_vec
along
with a few simple tests.  It makes auto_vec safe to use in
containers
that expect copyable and assignable element types and passes
bootstrap
and regression testing on x86_64-linux.


The question is whether we want such uses to appear since those
can be quite inefficient?  Thus the option is to delete those
operators?


I would strongly prefer the generic vector class to have the
properties
expected of any other generic container: copyable and
assignable.  If
we also want another vector type with this restriction I suggest
to add
another "noncopyable" type and make that property explicit in
its name.
I can submit one in a followup patch if you think we need one.


I'm not sure (and not strictly against the copy and assign).
Looking around
I see that vec<> does not do deep copying.  Making auto_vec<> do it
might be surprising (I added the move capability to match how vec<>
is used - as "reference" to a vector)


The vec base classes are special: they have no ctors at all (because
of their use in unions).  That's something we might have to live with
but it's not a model to follow in ordinary containers.


I don't think we have to live with it anymore, now that we're
writing C++11.


The auto_vec class was introduced to fill the need for a conventional
sequence container with a ctor and dtor.  The missing copy ctor and
assignment operators were an oversight, not a deliberate feature.
This change fixes that oversight.

The revised patch also adds a copy ctor/assignment to the auto_vec
primary template (that's also missing it).  In addition, it adds
a new class called auto_vec_ncopy that disables copying and
assignment as you prefer.


Hmm, adding another class doesn't really help with the confusion
richi mentions.  And many uses of auto_vec will pass them as vec,
which will still do a shallow copy.  I think it's probably better
to disable the copy special members for auto_vec until we fix vec<>.


There are at least a couple of problems that get in the way of fixing
all of vec to act like a well-behaved C++ container:

1) The embedded vec has a trailing "flexible" array member with its
instances having different size.  They're initialized by memset and
copied by memcpy.  The class can't have copy ctors or assignments
but it should disable/delete them instead.

2) The heap-based vec is used throughout GCC with the assumption of
shallow copy semantics (not just as function arguments but also as
members of other such POD classes).  This can be changed by providing
copy and move ctors and assignment operators for it, and also for
some of the classes in which it's a member and that are used with
the same assumption.

3) The heap-based vec::block_remove() assumes its elements are PODs.
That breaks in VEC_ORDERED_REMOVE_IF (used in gcc/dwarf2cfi.c:2862
and tree-vect-patterns.c).

I took a stab at both and while (1) is easy, (2) is shaping up to
be a big and tricky project.  Tricky because it involves using
std::move in places where what's moved is subsequently still used.
I can keep plugging away at it but it won't change the fact that
the embedded and heap-based vecs have different requirements.

It doesn't seem to me that having a safely copyable auto_vec needs
to be put on hold until the rats nest above is untangled.  It won't
make anything worse than it is.  (I have a project that depends on
a sane auto_vec working).

A couple of alternatives to solving this are to use std::vector or
write an equivalent vector class just for GCC.


It occurs to me that another way to work around the issue of passing
an auto_vec by value as a vec, and thus doing a shallow copy, would
be to add a vec ctor taking an auto_vec, and delete that.  This would
mean if you want to pass an auto_vec to a vec interface, it needs to
be by reference.  We might as well do the same for operator=, though
that isn't as important.


Thanks, that sounds like a good idea.  Attached is an implementation
of this change.  Since the auto_vec copy ctor and assignment have
been deleted by someone else i

[PING][PATCH 1/4] introduce diagnostic infrastructure changes (PR 98512)

2021-06-28 Thread Martin Sebor via Gcc-patches

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572839.html

On 6/15/21 5:00 PM, Martin Sebor wrote:

On 6/11/21 11:04 AM, David Malcolm wrote:

On Thu, 2021-06-10 at 17:26 -0600, Martin Sebor wrote:

This diff introduces the diagnostic infrastructure changes to support
controlling warnings at any call site in the inlining stack and
printing
the inlining context without the %K and %G directives.


Thanks for working on this, looks very promising.


Improve warning suppression for inlined functions.

Resolves:
PR middle-end/98871 - Cannot silence -Wmaybe-uninitialized at 
declaration site
PR middle-end/98512 - #pragma GCC diagnostic ignored ineffective in 
conjunction with alias attribute


Am I right in thinking that you add test coverage for both of these in
patch 2 of the kit?


Yes, the tests depend on the changes in patch 2 (some existing tests
fail with just patch 1 applied because the initial location passed
to warning_t() is different than with it).





gcc/ChangeLog:

* diagnostic.c (update_inlining_context): New.
(update_effective_level_from_pragmas): Handle inlining context.
(diagnostic_report_diagnostic): Same.
* diagnostic.h (struct diagnostic_info): Add ctor.
(struct diagnostic_context): Add members.
* tree-diagnostic.c (get_inlining_locations): New.
(set_inlining_location): New.
(tree_diagnostics_defaults): Set new callback pointers.


[..snip...]

@@ -1204,7 +1256,7 @@ diagnostic_report_diagnostic 
(diagnostic_context *context,
    /* We do this to avoid giving the message for 
-pedantic-errors.  */

    orig_diag_kind = diagnostic->kind;
  }
-
+


Stray whitespace change?  Though it looks like a fix of a stray space,
so not a big deal.


    if (diagnostic->kind == DK_NOTE && context->inhibit_notes_p)
  return false;


[..snip...]


diff --git a/gcc/diagnostic.h b/gcc/diagnostic.h
index 1b9d6b1f64d..b95ee23dda0 100644
--- a/gcc/diagnostic.h
+++ b/gcc/diagnostic.h
@@ -87,6 +87,10 @@ enum diagnostics_extra_output_kind
 list in diagnostic.def.  */
  struct diagnostic_info
  {
+  diagnostic_info ()
+    : message (), richloc (), metadata (), x_data (), kind (), 
option_index ()

+  { }
+


Why does the patch add this ctor?


The new code relies on x_data being initially null, and to make it so
I considered two alternatives explicitly initialize the struct or add
a ctor.  I had started with the former but wound up with the latter
after a few ICEs.





    /* Text to be formatted.  */
    text_info message;
@@ -343,6 +347,32 @@ struct diagnostic_context
    /* Callback for final cleanup.  */
    void (*final_cb) (diagnostic_context *context);
+
+  /* The inlining context of the diagnostic (may have just one
+ element if a diagnostic is not for an inlined expression).  */
+  struct inlining_ctx
+  {
+    void reset ()
+    {
+  ilocs.release ();
+  loc = UNKNOWN_LOCATION;
+  ao = NULL;
+  allsyslocs = false;
+    }
+
+    /* Locations along the inlining stack.  */
+    auto_vec ilocs;
+    /* The locus of the diagnostic. */
+    location_t loc;
+    /* The abstract origin of the location.  */
+    void *ao;
+    /* Set of every ILOCS element is in a system header.  */
+    bool allsyslocs;
+  } ictx;


Why is the inlining ctx part of the diagnostic_context?  That feels
strange to me. This inlining information relates to a particular
diagnostic, so it seems more appropriate to me that it should be part
of the diagnostic_info (which might thus necessitate having a ctor for
diagnostic_info).  Doing that might avoid the need for "reset", if I'm
right in assuming that getting the data is done once per diagnostic
during diagnostic_report_diagnostic.


I thought that's what you'd suggested when we spoke but I must have
have misremembered or misunderstood.  I agree it fits better in
the diagnostic_info and I've moved it there.



Alternatively, could this be state that's created on the stack during
diagnostic_report_diagnostic and passed around by pointer as another
parameter?  (putting it in diagnostic_info might be simplest though)


Yes, that sounds good to me too.



Maybe rename it to "inlining_info"?

How involved would it be to make it be a class with private fields?


Not too involved.  It would involve adding accessors and modifiers
for all of them.  I would normally be in favor of it but I don't
think it's worth the effort for such a small struct that's a member
of another that doesn't use proper encapsulation.  If/when the other
classes in this area are encapsulated it might be a good time to do
it for this class too.



Can the field names have "m_" prefixes, please?


Done.




+  /* Callbacks to get and set the inlining context.  */


Probably should spell out in the comment here that doing so requires
knowledge of trees, which is why it's a callback (to avoid diagnostic.c
from having to know about trees).


Done.




+  void (*get_locations_cb)(diagnostic_context *, diagnostic_info *);
+  void (*set_locatio

Re: [PATCH 2/3] Fix IEEE 128-bit min/max test.

2021-06-28 Thread Michael Meissner via Gcc-patches
On Fri, Jun 25, 2021 at 12:46:37PM -0500, Segher Boessenkool wrote:
> On Thu, Jun 17, 2021 at 04:11:40PM -0400, Michael Meissner wrote:
> > On Thu, Jun 17, 2021 at 01:11:58PM -0500, Segher Boessenkool wrote:
> > > > --- a/gcc/testsuite/gcc.target/powerpc/float128-minmax.c
> > > > +++ b/gcc/testsuite/gcc.target/powerpc/float128-minmax.c
> > > > @@ -1,6 +1,5 @@
> > > > -/* { dg-do compile { target lp64 } } */
> > > 
> > > Does that work?  Why was it there before?
> > 
> > The lp64 eliminates 32-bit, which does not support hardware IEEE 128-bit 
> > due to
> > the lack of TImode.
> 
> I still do not understand this.  Why would support for QP float require
> TImode?  "Need an integer mode of the same size" is not a convincing
> argument, since double-double is a 16 byte mode as well.

I suspect it is because we separate moves for IBM long double before the pass
that wants to use an integer type to do the move, so it doesn't see the 128-bit
type.

> > The test was written before the ppc_float128_hw test.  Now
> > that we have ppc_float128_hw, we don't need an explicit lp64.
> 
> Ah good, some progress.  Well, it *is* an improvement, a better
> abstraction, but on the other hand it only hides the actual problems
> deeper :-/
> 
> > > >  /* { dg-require-effective-target powerpc_p9vector_ok } */
> > > > -/* { dg-require-effective-target float128 } */
> > > > +/* { dg-require-effective-target ppc_float128_hw } */
> > > 
> > > Why is it okay to no longer run this test where it ran before?
> > 
> > The ppc_float128_hw test is a more precise test than just float128 and 
> > power9.
> 
> You did not delete the p9 test though.

Yes, I can probably delete the powerpc_p9vector_ok test.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Re: [PATCH 1/3] Add IEEE 128-bit min/max support on PowerPC.

2021-06-28 Thread Michael Meissner via Gcc-patches
On Wed, Jun 23, 2021 at 06:56:37PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Jun 17, 2021 at 03:18:48PM -0400, Michael Meissner wrote:
> > > The actual insns only check TARGET_POWER10 (so no TARGET_FLOAT128_HW).
> > > Which is right, this or that?
> > 
> > It should include TARGET_FLOAT128_HW.
> 
> Okay, so fix that :-)


> > The problem area is a power10 running in
> > big endian mode and running 32-bit code.  Because we don't have TImode, we
> > can't enable the IEEE 128-bit hardware instructions.
> 
> I don't see why not?
> 
> > > > +/* { dg-require-effective-target ppc_float128_hw } */
> > > > +/* { dg-require-effective-target power10_ok } */
> > > > +/* { dg-options "-mdejagnu-cpu=power10 -O2 -ffast-math" } */
> > > 
> > > In testcases we can assume that float128_hw is set whenever we have a
> > > p10; we don't manually disable it to make live hard for ourselves ;-)
> > 
> > Again, I put it in case somebody builds a BE power10 compiler.
> 
> This should still be fixed.  And yes, people do test BE p10, of course.
> And BE p10 *should* enable the QP float insns.  Does it not currently?

GCC does not enable __float128 by default on BE.  The reason is there are no
plans to enable all of the float128 support in glibc in BE.  Without a library,
it is kind of useless to enable __float128.

If the compiler enabled __float128, It breaks things that check if __float128
is avaiable.  They think __float128 is available, and then they fail when when
they can't anything besides basic arithmetic.

Because the compiler is configured not to enable __float128 in a BE context, we
don't build the __float128 emulator in libgcc.

In addition, BE GCC runs on things that does not have GLIBC (like AIX).  If we
enabled it by default, it would break those environments.

A further complication is BE by default is still power4 or power5.  You need
VSX support to even pass __float128 arguments.  While it is possible to pass
__float128 in GPRs, you run into compatibility issues if one module is compiled
with VSX and another is compiled without setting a base cpu, because one module
will expect things in GPRs and the other in Altivec registers.

And as I've said, the issue with 32-bit move is we don't have TImode support.
Some of the machine indepenent passes want to use an appropriate integer type
to move basic types.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Re: [PATCH, rs6000] Update Power10 scheduling description for fused instruction types

2021-06-28 Thread Pat Haugen via Gcc-patches
On 6/7/21 3:41 PM, Pat Haugen via Gcc-patches wrote:
> Update Power10 scheduling description for new fused instruction types.
> 
> Bootstrap/regtest on powerpc64le(Power10) with no new regressions. Ok for
> trunk?
> 
> -Pat
> 
> 
> 2021-06-07  Pat Haugen  
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/power10.md (power10-fused-load, power10-fused-store,
>   power10-fused_alu, power10-fused-vec, power10-fused-branch): New.
> 
> 

Forgot to ask if ok to backport to GCC 11 since the new instruction types were 
backported.

-Pat



[PING][PATCH] correct handling of variable offset minus constant in -Warray-bounds (PR 100137)

2021-06-28 Thread Martin Sebor via Gcc-patches

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573349.html

On 6/21/21 4:25 PM, Martin Sebor wrote:

-Warray-bounds relies on similar logic as -Wstringop-overflow et al.,
but using its own algorithm, including its own bugs such as PR 100137.
The attached patch takes the first step toward unifying the logic
between the warnings.  It changes a subset of -Warray-bounds to call
compute_objsize() to detect out-of-bounds indices.  Besides fixing
the bug this also nicely simplifies the code and improves
the consistency between the informational messages printed by both
classes of warnings.

The changes to the test suite are extensive mainly because of
the different format of the diagnostics resulting from slightly
tighter bounds of offsets computed by the new algorithm, and in
smaller part because the change lets -Warray-bounds diagnose some
problems it previously missed due to the limitations of its own
solution.

The false positive reported in PR 100137 is a 10/11/12 regression
but this change is too intrusive to backport.  I have a smaller
and more targeted patch I plan to backport in its stead.

Tested on x86_64-linux.

Martin




Re: [PATCH v5 2/2] x86: Add vec_duplicate expander

2021-06-28 Thread H.J. Lu via Gcc-patches
On Mon, Jun 28, 2021 at 5:36 AM Richard Sandiford
 wrote:
>
> "H.J. Lu"  writes:
> > On Sun, Jun 27, 2021 at 2:00 PM Richard Sandiford
> >  wrote:
> >>
> >> "H.J. Lu via Gcc-patches"  writes:
> >> > On Sun, Jun 27, 2021 at 1:43 AM Richard Sandiford
> >> >  wrote:
> >> >>
> >> >> "H.J. Lu"  writes:
> >> >> > 1. Update vec_duplicate to allow to fail so that backend can only 
> >> >> > allow
> >> >> > broadcasting an integer constant to a vector when broadcast 
> >> >> > instruction
> >> >> > is available.  This can be used by memset expander to avoid 
> >> >> > vec_duplicate
> >> >> > when loading from constant pool is more efficient.
> >> >>
> >> >> I don't see any changes in target-independent code though, other than
> >> >> the doc update.  It's still the case that (existing) uses of
> >> >> vec_duplicate_optab do not allow it to fail.
> >> >
> >> > I have a followup patch set on
> >> >
> >> > https://gitlab.com/x86-gcc/gcc/-/commits/users/hjl/pieces/broadcast
> >> >
> >> > to use it to expand memset with vector broadcast:
> >> >
> >> > https://gitlab.com/x86-gcc/gcc/-/commit/991c87f8a83ca736ae9ed92baa3ebadca289f6e3
> >> >
> >> > For SSE2 which doesn't have vector broadcast, the constant vector 
> >> > broadcast
> >> > expander returns FAIL and load from constant pool will be used.
> >>
> >> Hmm, but as Jeff and I mentioned in the earlier replies,
> >> vec_duplicate_optab shouldn't be used for constants.  Constants
> >> should go via the move expanders instead.
> >>
> >> In a previous message I suggested:
> >>
> >>   … would it work to change:
> >>
> >> /* Try using vec_duplicate_optab for uniform vectors.  */
> >> if (!TREE_SIDE_EFFECTS (exp)
> >> && VECTOR_MODE_P (mode)
> >> && eltmode == GET_MODE_INNER (mode)
> >> && ((icode = optab_handler (vec_duplicate_optab, mode))
> >> != CODE_FOR_nothing)
> >> && (elt = uniform_vector_p (exp)))
> >>
> >>   to something like:
> >>
> >> /* Try using vec_duplicate_optab for uniform vectors.  */
> >> if (!TREE_SIDE_EFFECTS (exp)
> >> && VECTOR_MODE_P (mode)
> >> && eltmode == GET_MODE_INNER (mode)
> >> && (elt = uniform_vector_p (exp)))
> >>   {
> >> if (TREE_CODE (elt) == INTEGER_CST
> >> || TREE_CODE (elt) == POLY_INT_CST
> >> || TREE_CODE (elt) == REAL_CST
> >> || TREE_CODE (elt) == FIXED_CST)
> >>   {
> >> rtx src = gen_const_vec_duplicate (mode, expand_normal 
> >> (node));
> >> emit_move_insn (target, src);
> >> break;
> >>   }
> >> …
> >>   }
> >>
> >> if that code was the source of the constant operand.  If we're adding a
> >> new use of vec_duplicate_optab then that should be similarly protected
> >> against constant operands.
> >>
> >
> > Your comments apply to my initial vec_duplicate patch that caused the
> > gcc.dg/pr100239.c failure.  It has been fixed by
> >
> > commit ffe3a37f54ab866d85bdde48c2a32be5e09d8515
> > Author: Richard Biener 
> > Date:   Mon Jun 7 20:08:13 2021 +0200
> >
> > middle-end/100951 - make sure to generate VECTOR_CST in lowering
> >
> > When vector lowering creates piecewise ops make sure to create
> > VECTOR_CSTs instead of CONSTRUCTORs when possible.
> >
> > The problem I am running into now is in my memset vector broadcast
> > patch.  In order to optimize vector broadcast for memset, I need to
> > generate a pseudo register for
> >
> >  __builtin_memset (ops, 3, 38);
> >
> > only when vector broadcast is available:
> >
> >   rtx target = nullptr;
> >
> >   unsigned int nunits = GET_MODE_SIZE (mode) / GET_MODE_SIZE (QImode);
> >   machine_mode vector_mode;
> >   if (!mode_for_vector (QImode, nunits).exists (&vector_mode))
> > gcc_unreachable ();
> >
> >   enum insn_code icode = optab_handler (vec_duplicate_optab,
> > vector_mode);
> >   if (icode != CODE_FOR_nothing)
> > {
> >   rtx reg = targetm.gen_memset_scratch_rtx (vector_mode);
> >   class expand_operand ops[2];
> >   create_output_operand (&ops[0], reg, vector_mode);
> >   create_input_operand (&ops[1], data, QImode);
> >   if (maybe_expand_insn (icode, 2, ops))
> > {
> >   if (!rtx_equal_p (reg, ops[0].value))
> > emit_move_insn (reg, ops[0].value);
> >   target = lowpart_subreg (mode, reg, vector_mode);
> > }
> > }
> >
> >   return target;  <<< Return nullptr to load from constant pool.
>
> I don't think this is a correct use of vec_duplicate_optab.  If the
> scalar operand is a constant then the move should always go through
> the move expanders instead, as a move from a CONST_VECTOR.

Like this?

  enum insn_code icode = optab_handler (vec_duplicate_optab,
vector_mode);
  if (icode != CODE_FOR_nothing)
{
  rtx

Re: [PATCH, rs6000] Update Power10 scheduling description for fused instruction types

2021-06-28 Thread Segher Boessenkool
On Mon, Jun 28, 2021 at 02:31:24PM -0500, Pat Haugen wrote:
> On 6/7/21 3:41 PM, Pat Haugen via Gcc-patches wrote:
> > Update Power10 scheduling description for new fused instruction types.

> > * config/rs6000/power10.md (power10-fused-load, power10-fused-store,
> > power10-fused_alu, power10-fused-vec, power10-fused-branch): New.

> Forgot to ask if ok to backport to GCC 11 since the new instruction types 
> were backported.

Yes, this is okay to backport.  Thanks!


Segher


Re: [PATCH 2/13] v2 Use new per-location warning APIs in Ada.

2021-06-28 Thread Martin Sebor via Gcc-patches

On 6/23/21 11:07 PM, Jeff Law wrote:



On 6/4/2021 3:41 PM, Martin Sebor via Gcc-patches wrote:

The attached patch replaces the uses of TREE_NO_WARNING in the Ada front
end with the new suppress_warning(), warning_suppressed_p(), and
copy_warning() APIs.

gcc-no-warning-ada.diff

Add support for per-location warning groups.

gcc/ada/ChangeLog:

* gcc-interface/trans.c (Handled_Sequence_Of_Statements_to_gnu):
Replace TREE_NO_WARNING with suppress_warning.
(gnat_gimplify_expr): Same.
* gcc-interface/utils.c (gnat_pushdecl): Same.

OK once prereqs are approved.
jeff



Pushed in r12-1857.

Martin


Re: [PATCH 7/13] v2 Use new per-location warning APIs in the FORTRAN front end

2021-06-28 Thread Martin Sebor via Gcc-patches

On 6/23/21 11:05 PM, Jeff Law wrote:



On 6/4/2021 3:42 PM, Martin Sebor via Gcc-patches wrote:

The attached patch replaces the uses of TREE_NO_WARNING in the FORTRAN
front end with the new suppress_warning() API.

gcc-no-warning-fortran.diff

Add support for per-location warning groups.

gcc/fortran/ChangeLog:

* trans-array.c (trans_array_constructor): Replace direct uses
of TREE_NO_WARNING with warning_suppressed_p, and suppress_warning.
* trans-decl.c (gfc_build_qualified_array): Same.
(gfc_build_dummy_array_decl): Same.
(generate_local_decl): Same.
(gfc_generate_function_code): Same.
* trans-openmp.c (gfc_omp_clause_default_ctor): Same.
(gfc_omp_clause_copy_ctor): Same.
* trans-types.c (get_dtype_type_node): Same.
(gfc_get_desc_dim_type): Same.
(gfc_get_array_descriptor_base): Same.
(gfc_get_caf_vector_type): Same.
(gfc_get_caf_reference_type): Same.
* trans.c (gfc_create_var_np): Same.

OK once prereqs are approved.
jeff



Retested and pushed in r12-1858.

Martin


Re: [PATCH 8/13] v2 Use new per-location warning APIs in libcc1

2021-06-28 Thread Martin Sebor via Gcc-patches

On 6/23/21 11:04 PM, Jeff Law wrote:



On 6/4/2021 3:42 PM, Martin Sebor via Gcc-patches wrote:

The attached patch replaces the uses of TREE_NO_WARNING in libcc1 with
the new suppress_warning() API.

gcc-no-warning-libcc1.diff

Add support for per-location warning groups.

libcc1/ChangeLog:

* libcp1plugin.cc (record_decl_address): Replace a direct use
of TREE_NO_WARNING with suppress_warning.

OK once prereqs are approved.
jeff



Pushed in r12-1859.

Martin


Re: [PATCH 11/13] v2 Use new per-location warning APIs in the Objective-C front end

2021-06-28 Thread Martin Sebor via Gcc-patches

On 6/23/21 11:02 PM, Jeff Law wrote:



On 6/4/2021 3:43 PM, Martin Sebor via Gcc-patches wrote:

The attached patch replaces the uses of TREE_NO_WARNING in
the Objective-C front end with the new suppress_warning(),
warning_suppressed_p(), and copy_warning() APIs.

gcc-no-warning-objc.diff

Add support for per-location warning groups.

gcc/objc/ChangeLog:

* objc-act.c (objc_maybe_build_modify_expr): Replace direct uses
of TREE_NO_WARNING with warning_suppressed_p, and suppress_warning.
(objc_build_incr_expr_for_property_ref): Same.
(objc_build_struct): Same.
(synth_module_prologue): Same.
* objc-gnu-runtime-abi-01.c (gnu_runtime_01_initialize): Same.
* objc-next-runtime-abi-01.c (next_runtime_01_initialize): Same.
* objc-next-runtime-abi-02.c (next_runtime_02_initialize): Same.

OK once prereqs are approved.

Jeff


Retested and pushed in r12-1860.

Martin


Re: [PATCH 13/13] v2 Add regression tests for PR 74765 and 74762

2021-06-28 Thread Martin Sebor via Gcc-patches

On 6/23/21 10:56 PM, Jeff Law wrote:



On 6/4/2021 3:43 PM, Martin Sebor via Gcc-patches wrote:

The attached patch adds regression tests for two closely related bugs
resolved by the patch series.

gcc-no-warning-tests.diff

Regression tests for TREE_NO_WARNING enhancement to warning groups.

PR middle-end/74765 - missing uninitialized warning (parenthesis, 
TREE_NO_WARNING abuse)
PR middle-end/74762 - [9/10/11/12 Regression] missing uninitialized warning 
(C++, parenthesized expr, TREE_NO_WARNING)

gcc/testsuite/ChangeLog:

* g++.dg/uninit-pr74762.C: New test.
* g++.dg/warn/uninit-pr74765.C: Same.

diff --git a/gcc/testsuite/g++.dg/uninit-pr74762.C 
b/gcc/testsuite/g++.dg/uninit-pr74762.C

This is OK once the prereqs are all approved.
jeff



Pushed in r12-1861.

Martin


Re: [PATCH v2] fixinc: don't "fix" machine names in __has_include(...) [PR91085]

2021-06-28 Thread Bruce Korb via Gcc-patches

Hi Xi,

On 6/27/21 11:07 PM, Xi Ruoyao wrote:

diff --git a/fixincludes/fixfixes.c b/fixincludes/fixfixes.c
index 5b23a8b640d..147cba716c7 100644
--- a/fixincludes/fixfixes.c
+++ b/fixincludes/fixfixes.c
@@ -524,7 +524,7 @@ FIX_PROC_HEAD( machine_name_fix )
/* If the 'name_pat' matches in between base and limit, we have
   a bogon.  It is not worth the hassle of excluding comments
   because comments on #if/#ifdef lines are rare, and strings on
- such lines are illegal.
+ such lines are only legal in a "__has_include" directive.
  
   REG_NOTBOL means 'base' is not at the beginning of a line, which

   shouldn't matter since the name_re has no ^ anchor, but let's
@@ -544,6 +544,31 @@ FIX_PROC_HEAD( machine_name_fix )
  break;
  
p = base + match[0].rm_so;


This function is already 90 lines long. This would be better in a function.


+
+  /* Check if the match is in __has_include(...) (PR 91085). */
+  for (q = base; q < p; q++)
+if (!strncmp (q, "__has_include", 13))
+  {
+r = q + 13;
+while (r < p && ISSPACE (*r))
+  r++;
+
+/* "__has_include" may appear as "defined(__has_include)",
+   search for the next appearance then.  */
+if (*r != '(')
+  continue;
+
+/* To avoid too much complexity, just hope there is never a
+   ')' in a header name.  */
+while (r < limit && *r != ')')
+  r++;


strchr()? I'd use strchr() to find the start of "__has_include" as well. 
A character-by-character search is more obtuse and any CPU cycle savings 
are pretty marginal. Also:


char const has_inc[] = "__has_include"; int const has_inc_len = 
sizeof(has_inc) - 1;


It makes what's going on more plain by eliminating a magic number (13).


+if (r >= base + match[0].rm_eo)
+  {
+base = r;
+goto again;
+  }
+  }
+
base += match[0].rm_eo;
  
/* One more test: if on the same line we have the same string

diff --git a/fixincludes/inclhack.def b/fixincludes/inclhack.def
index 3a4cfe06542..31389396af6 100644
--- a/fixincludes/inclhack.def
+++ b/fixincludes/inclhack.def
@@ -3151,7 +3151,8 @@ fix = {
  c_fix = machine_name;
  
  test_text = "/* MACH_DIFF: */\n"

-"#if defined( i386 ) || defined( sparc ) || defined( vax )"
+"#if defined( i386 ) || defined( sparc ) || defined( vax ) || "
+"defined( linux ) || __has_include (  ) || defined ( linux )"
No need for a redundant "defined(linux)" test. If you want to test 
superfluous spaces around the parentheses, just do it for one of the 
machine types.

  "\n/* no uniform test, so be careful  :-) */";
  };
  
diff --git a/fixincludes/tests/base/testing.h b/fixincludes/tests/base/testing.h

index cf95321fb86..00e8dde003e 100644
--- a/fixincludes/tests/base/testing.h
+++ b/fixincludes/tests/base/testing.h
@@ -64,7 +64,7 @@ BSD43__IOWR('T', 1) /* Some are multi-line */
  
  #if defined( MACHINE_NAME_CHECK )

  /* MACH_DIFF: */
-#if defined( i386 ) || defined( sparc ) || defined( vax )
+#if defined( i386 ) || defined( sparc ) || defined( vax ) || defined( linux ) || 
__has_include (  ) || defined ( linux )
  /* no uniform test, so be careful  :-) */
  #endif  /* MACHINE_NAME_CHECK */


Thanks for working on this.

Regards, Bruce



[PATCH v6 2/2] x86: Add vec_duplicate expander

2021-06-28 Thread H.J. Lu via Gcc-patches
Add vec_duplicate expander for SSE2 if we can move from GPR to SSE
register directly.

* config/i386/i386-expand.c (ix86_expand_vector_init_duplicate):
Make it global.
* config/i386/i386-protos.h (ix86_expand_vector_init_duplicate):
New prototype.
* config/i386/sse.md (INT_BROADCAST_MODE): New mode iterator.
(vec_duplicate): New expander.
---
 gcc/config/i386/i386-expand.c |  5 +
 gcc/config/i386/i386-protos.h |  2 ++
 gcc/config/i386/sse.md| 31 +++
 3 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index e0e3ed4d8a4..e04019c4b79 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -93,9 +93,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "i386-builtins.h"
 #include "i386-expand.h"
 
-static bool ix86_expand_vector_init_duplicate (bool, machine_mode, rtx,
-  rtx);
-
 /* Split one or more double-mode RTL references into pairs of half-mode
references.  The RTL can be REG, offsettable MEM, integer constant, or
CONST_DOUBLE.  "operands" is a pointer to an array of double-mode RTLs to
@@ -13909,7 +13906,7 @@ static bool expand_vec_perm_1 (struct expand_vec_perm_d 
*d);
 /* A subroutine of ix86_expand_vector_init.  Store into TARGET a vector
with all elements equal to VAR.  Return true if successful.  */
 
-static bool
+bool
 ix86_expand_vector_init_duplicate (bool mmx_ok, machine_mode mode,
   rtx target, rtx val)
 {
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 71745b9a1ea..51376fcc454 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -258,6 +258,8 @@ extern void ix86_expand_mul_widen_hilo (rtx, rtx, rtx, 
bool, bool);
 extern void ix86_expand_sse2_mulv4si3 (rtx, rtx, rtx);
 extern void ix86_expand_sse2_mulvxdi3 (rtx, rtx, rtx);
 extern void ix86_expand_sse2_abs (rtx, rtx);
+extern bool ix86_expand_vector_init_duplicate (bool, machine_mode, rtx,
+  rtx);
 
 /* In i386-c.c  */
 extern void ix86_target_macros (void);
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index ffcc0c81964..5ededaedac7 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -24814,3 +24814,34 @@ (define_insn "*aesu8"
   "TARGET_WIDEKL"
   "aes\t{%0}"
   [(set_attr "type" "other")])
+
+;; Modes handled by broadcast patterns.  NB: Allow V64QI and V32HI with
+;; TARGET_AVX512F since ix86_expand_vector_init_duplicate can expand
+;; without TARGET_AVX512BW which is used by memset vector broadcast
+;; expander to XI with:
+;; vmovd   %edi, %xmm15
+;; vpbroadcastb%xmm15, %ymm15
+;; vinserti64x4$0x1, %ymm15, %zmm15, %zmm15
+
+(define_mode_iterator INT_BROADCAST_MODE
+  [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI
+   (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
+   (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
+   (V8DI "TARGET_AVX512F && TARGET_64BIT")
+   (V4DI "TARGET_AVX && TARGET_64BIT") (V2DI "TARGET_64BIT")])
+
+;; Broadcast from an integer.  NB: Enable broadcast only if we can move
+;; from GPR to SSE register directly.
+(define_expand "vec_duplicate"
+  [(set (match_operand:INT_BROADCAST_MODE 0 "register_operand")
+   (vec_duplicate:INT_BROADCAST_MODE
+ (match_operand: 1 "nonimmediate_operand")))]
+  "TARGET_SSE2 && TARGET_INTER_UNIT_MOVES_TO_VEC"
+{
+  if (!ix86_expand_vector_init_duplicate (false,
+ GET_MODE (operands[0]),
+ operands[0],
+ operands[1]))
+gcc_unreachable ();
+  DONE;
+})
-- 
2.31.1



[PATCH v6 0/2] x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast

2021-06-28 Thread H.J. Lu via Gcc-patches
Changes in the v6 patch:

1. Update SI/DI broadcast with AVX.
2. Require non-standard SSE constant integer broadcast with AVX.
3. Use nonimmediate_operand in vec_duplicate and verify that it
never fails.

Changes in the v5 patch:

1. Allow AVX with SI/DI broadcast.
2. Add a comment for broadcasting to V64QI and V32HI with AVX512F, but
without AVX512BW.

---
1. Update move expanders to convert the CONST_WIDE_INT and CONST_VECTO
operands to vector broadcast from an integer with AVX2.
2. Add ix86_gen_scratch_sse_rtx to return a scratch SSE register which
won't increase stack alignment requirement and blocks transformation by
the combine pass.

A small benchmark:

https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/memset/broadcast

shows that broadcast is a little bit faster on Intel Core i7-8559U:

$ make
gcc -g -I. -O2   -c -o test.o test.c
gcc -g   -c -o memory.o memory.S
gcc -g   -c -o broadcast.o broadcast.S
gcc -g   -c -o vec_dup_sse2.o vec_dup_sse2.S
gcc -o test test.o memory.o broadcast.o vec_dup_sse2.o
./test
memory  : 147215
broadcast   : 121213
vec_dup_sse2: 171366
$

broadcast is also smaller:

$ size memory.o broadcast.o
   textdata bss dec hex filename
132   0   0 132  84 memory.o
122   0   0 122  7a broadcast.o
$

3. Update PR 87767 tests to expect integer broadcast instead of broadcast
from memory.
4. Update avx512f_cond_move.c to expect integer broadcast.

A small benchmark:

https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/vpaddd/broadcast

shows that integer broadcast is faster than embedded memory broadcast:

$ make
gcc -g -I. -O2 -march=skylake-avx512   -c -o test.o test.c
gcc -g   -c -o memory.o memory.S
gcc -g   -c -o broadcast.o broadcast.S
gcc -o test test.o memory.o broadcast.o
./test
memory  : 425538
broadcast   : 375260
$

Add vec_duplicate expander for SSE2 if we can move from GPR to SSE
register directly.

H.J. Lu (2):
  x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast
  x86: Add vec_duplicate expander

 gcc/config/i386/i386-expand.c | 193 --
 gcc/config/i386/i386-protos.h |   4 +
 gcc/config/i386/i386.c|  13 ++
 gcc/config/i386/sse.md|  31 +++
 .../i386/avx512f-broadcast-pr87767-1.c|   7 +-
 .../i386/avx512f-broadcast-pr87767-5.c|   5 +-
 .../gcc.target/i386/avx512f_cond_move.c   |   4 +-
 .../i386/avx512vl-broadcast-pr87767-1.c   |  12 +-
 .../i386/avx512vl-broadcast-pr87767-5.c   |   9 +-
 gcc/testsuite/gcc.target/i386/pr100865-1.c|  13 ++
 gcc/testsuite/gcc.target/i386/pr100865-10a.c  |  33 +++
 gcc/testsuite/gcc.target/i386/pr100865-10b.c  |   7 +
 gcc/testsuite/gcc.target/i386/pr100865-11a.c  |  23 +++
 gcc/testsuite/gcc.target/i386/pr100865-11b.c  |   8 +
 gcc/testsuite/gcc.target/i386/pr100865-11c.c  |   8 +
 gcc/testsuite/gcc.target/i386/pr100865-12a.c  |  20 ++
 gcc/testsuite/gcc.target/i386/pr100865-12b.c  |   8 +
 gcc/testsuite/gcc.target/i386/pr100865-12c.c  |   8 +
 gcc/testsuite/gcc.target/i386/pr100865-2.c|  14 ++
 gcc/testsuite/gcc.target/i386/pr100865-3.c|  15 ++
 gcc/testsuite/gcc.target/i386/pr100865-4a.c   |  16 ++
 gcc/testsuite/gcc.target/i386/pr100865-4b.c   |   9 +
 gcc/testsuite/gcc.target/i386/pr100865-5a.c   |  16 ++
 gcc/testsuite/gcc.target/i386/pr100865-5b.c   |   9 +
 gcc/testsuite/gcc.target/i386/pr100865-6a.c   |  16 ++
 gcc/testsuite/gcc.target/i386/pr100865-6b.c   |   9 +
 gcc/testsuite/gcc.target/i386/pr100865-6c.c   |  16 ++
 gcc/testsuite/gcc.target/i386/pr100865-7a.c   |  17 ++
 gcc/testsuite/gcc.target/i386/pr100865-7b.c   |   9 +
 gcc/testsuite/gcc.target/i386/pr100865-7c.c   |  17 ++
 gcc/testsuite/gcc.target/i386/pr100865-8a.c   |  24 +++
 gcc/testsuite/gcc.target/i386/pr100865-8b.c   |   7 +
 gcc/testsuite/gcc.target/i386/pr100865-8c.c   |   7 +
 gcc/testsuite/gcc.target/i386/pr100865-9a.c   |  25 +++
 gcc/testsuite/gcc.target/i386/pr100865-9b.c   |   7 +
 gcc/testsuite/gcc.target/i386/pr100865-9c.c   |   7 +
 36 files changed, 621 insertions(+), 25 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-10a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-10b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-11a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-11b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-11c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-12a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-12b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-12c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-4a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-4b.c
 create mode 100644 gcc/testsuite/gcc.tar

[PATCH v6 1/2] x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast

2021-06-28 Thread H.J. Lu via Gcc-patches
1. Update move expanders to convert the CONST_WIDE_INT and CONST_VECTOR
operands to vector broadcast from an integer with AVX.
2. Add ix86_gen_scratch_sse_rtx to return a scratch SSE register which
won't increase stack alignment requirement and blocks transformation by
the combine pass.

A small benchmark:

https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/memset/broadcast

shows that broadcast is a little bit faster on Intel Core i7-8559U:

$ make
gcc -g -I. -O2   -c -o test.o test.c
gcc -g   -c -o memory.o memory.S
gcc -g   -c -o broadcast.o broadcast.S
gcc -g   -c -o vec_dup_sse2.o vec_dup_sse2.S
gcc -o test test.o memory.o broadcast.o vec_dup_sse2.o
./test
memory  : 147215
broadcast   : 121213
vec_dup_sse2: 171366
$

broadcast is also smaller:

$ size memory.o broadcast.o
   textdata bss dec hex filename
132   0   0 132  84 memory.o
122   0   0 122  7a broadcast.o
$

3. Update PR 87767 tests to expect integer broadcast instead of broadcast
from memory.
4. Update avx512f_cond_move.c to expect integer broadcast.

A small benchmark:

https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/vpaddd/broadcast

shows that integer broadcast is faster than embedded memory broadcast:

$ make
gcc -g -I. -O2 -march=skylake-avx512   -c -o test.o test.c
gcc -g   -c -o memory.o memory.S
gcc -g   -c -o broadcast.o broadcast.S
gcc -o test test.o memory.o broadcast.o
./test
memory  : 425538
broadcast   : 375260
$

gcc/

PR target/100865
* config/i386/i386-expand.c (ix86_expand_vector_init_duplicate):
New prototype.
(ix86_byte_broadcast): New function.
(ix86_convert_const_wide_int_to_broadcast): Likewise.
(ix86_expand_move): Convert CONST_WIDE_INT to broadcast if mode
size is 16 bytes or bigger.
(ix86_broadcast_from_integer_constant): New function.
(ix86_expand_vector_move): Convert CONST_WIDE_INT and CONST_VECTOR
to broadcast if mode size is 16 bytes or bigger.
* config/i386/i386-protos.h (ix86_gen_scratch_sse_rtx): New
prototype.
* config/i386/i386.c (ix86_gen_scratch_sse_rtx): New function.

gcc/testsuite/

PR target/100865
* gcc.target/i386/avx512f-broadcast-pr87767-1.c: Expect integer
broadcast.
* gcc.target/i386/avx512f-broadcast-pr87767-5.c: Likewise.
* gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Likewise.
* gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Likewise.
* gcc.target/i386/avx512f_cond_move.c: Also pass
-mprefer-vector-width=512 and expect integer broadcast.
* gcc.target/i386/pr100865-1.c: New test.
* gcc.target/i386/pr100865-2.c: Likewise.
* gcc.target/i386/pr100865-3.c: Likewise.
* gcc.target/i386/pr100865-4a.c: Likewise.
* gcc.target/i386/pr100865-4b.c: Likewise.
* gcc.target/i386/pr100865-5a.c: Likewise.
* gcc.target/i386/pr100865-5b.c: Likewise.
* gcc.target/i386/pr100865-6a.c: Likewise.
* gcc.target/i386/pr100865-6b.c: Likewise.
* gcc.target/i386/pr100865-6c.c: Likewise.
* gcc.target/i386/pr100865-7a.c: Likewise.
* gcc.target/i386/pr100865-7b.c: Likewise.
* gcc.target/i386/pr100865-7c.c: Likewise.
* gcc.target/i386/pr100865-8a.c: Likewise.
* gcc.target/i386/pr100865-8b.c: Likewise.
* gcc.target/i386/pr100865-8c.c: Likewise.
* gcc.target/i386/pr100865-9a.c: Likewise.
* gcc.target/i386/pr100865-9b.c: Likewise.
* gcc.target/i386/pr100865-9c.c: Likewise.
* gcc.target/i386/pr100865-10a.c: Likewise.
* gcc.target/i386/pr100865-10b.c: Likewise.
* gcc.target/i386/pr100865-11a.c: Likewise.
* gcc.target/i386/pr100865-11b.c: Likewise.
* gcc.target/i386/pr100865-11c.c: Likewise.
* gcc.target/i386/pr100865-12a.c: Likewise.
* gcc.target/i386/pr100865-12b.c: Likewise.
* gcc.target/i386/pr100865-12c.c: Likewise.
---
 gcc/config/i386/i386-expand.c | 194 --
 gcc/config/i386/i386-protos.h |   2 +
 gcc/config/i386/i386.c|  13 ++
 .../i386/avx512f-broadcast-pr87767-1.c|   7 +-
 .../i386/avx512f-broadcast-pr87767-5.c|   5 +-
 .../gcc.target/i386/avx512f_cond_move.c   |   4 +-
 .../i386/avx512vl-broadcast-pr87767-1.c   |  12 +-
 .../i386/avx512vl-broadcast-pr87767-5.c   |   9 +-
 gcc/testsuite/gcc.target/i386/pr100865-1.c|  13 ++
 gcc/testsuite/gcc.target/i386/pr100865-10a.c  |  33 +++
 gcc/testsuite/gcc.target/i386/pr100865-10b.c  |   7 +
 gcc/testsuite/gcc.target/i386/pr100865-11a.c  |  23 +++
 gcc/testsuite/gcc.target/i386/pr100865-11b.c  |   8 +
 gcc/testsuite/gcc.target/i386/pr100865-11c.c  |   8 +
 gcc/testsuite/gcc.target/i386/pr100865-12a.c  |  20 ++
 gcc/testsuite/gcc.target/i386/pr100865-12b.c  |   8 +
 gcc/testsuite/gcc.target/i386/pr100865-12c.c  |   8 +
 gcc/

Re: [Patch] Add 'default' to -foffload=; document that flag [PR67300]

2021-06-28 Thread Sandra Loosemore

On 6/28/21 9:51 AM, Tobias Burnus wrote:
I managed to delete the libgomp part before posting the patch, hence, 
reposted.


(The change from -foffload= to -foffload-options= ensures that also 
other configured compilers such as GCN are used, an issue that Thomas 
found. The original -foffload=nvptx-none=-latomic was added because as 
otherwise the GCN part caused build issues for Richard.)


Thus, this patch is like v3, except for the invoke.texi fixes suggested 
by Sandra (thanks!) + adding a ChangeLog
and like v4, except the lost libgomp changes has been re-added (+ 
ChangeLog update).


I hope it now is fine.


Hmmm.


--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -197,17 +197,17 @@ in the following sections.
 
 @item C Language Options

 @xref{C Dialect Options,,Options Controlling C Dialect}.
-@gccoptlist{-ansi  -std=@var{standard}  -fgnu89-inline @gol
--fpermitted-flt-eval-methods=@var{standard} @gol
--aux-info @var{filename}  -fallow-parameterless-variadic-functions @gol
--fno-asm  -fno-builtin  -fno-builtin-@var{function}  -fgimple@gol
--fhosted  -ffreestanding @gol
+@gccoptlist{-ansi  -std=@var{standard}  -aux-info @var{filename} @gol
+-fallow-parameterless-variadic-functions  -fno-asm  @gol
+-fno-builtin  -fno-builtin-@var{function}  -fcond-mismatch @gol
+-ffreestanding  -fgimple  -fgnu-tm  -fgnu89-inline  -fhosted @gol
+-flax-vector-conversions  -fms-extensions @gol
 -fopenacc  -fopenacc-dim=@var{geom} @gol
+-foffload=@var{arg} -foffload-options=@var{arg} @gol


Still need two spaces between these options on the same line inside 
@gccoptlist.



 -fopenmp  -fopenmp-simd @gol
--fms-extensions  -fplan9-extensions  -fsso-struct=@var{endianness} @gol
--fallow-single-precision  -fcond-mismatch  -flax-vector-conversions @gol
--fsigned-bitfields  -fsigned-char @gol
--funsigned-bitfields  -funsigned-char}
+-fpermitted-flt-eval-methods=@var{standard} @gol
+-fplan9-extensions -fsigned-bitfields -funsigned-bitfields @gol
+-fsigned-char -funsigned-char -fsso-struct=@var{endianness}}


And on both the last two lines here.

I didn't think it was necessary to alphabetize the actual documentation 
of the options (only the table in the option summary).  I'll have to 
assume that you didn't actually change any of the text you moved around. 
 The text for -foffload and -foffload-options looks fine now.


The documentation part of the patch is OK with the whitespace changes 
(no need to post another version for me to review that).


-Sandra


Re: [PATCH] Rearrange detection of temporary directory for NetBSD

2021-06-28 Thread Gerald Pfeifer
On Thu, 26 Mar 2020, Kamil Rytarowski wrote:
> On 25.03.2020 23:36, Jeff Law wrote:
>> I wouldn't mind dropping /usr/tmp.  That so antiquated that it'd be 
>> non- controversial.  Can you send that as a separate patch.
> Behavior for !__NetBSD__ is out of interest.

This is not a very useful approach in a collaborative project like GCC.

Incremental changes (including cleanups) help and are a good way to get 
engaged, improve the overall code base, and gain support from others 
(who may not have any interest in the __NetBSD__ case, but be willing 
to collaborate).

@Jeff, is the following what you had in mind?  

It passed testing on i686-unknown-freebsd12; okay to push?

Gerald


commit 8365565396cee65aeb6c2e4bfad74e095a3c388c
Author: Gerald Pfeifer 
Date:   Tue Jun 29 00:39:15 2021 +0200

libiberty: No longer use /usr/tmp

/usr/tmp is antiquated and not present on decently modern systems.
Remove it from consideration when choosing a directory for temporary
files.

libiberty:

2021-06-29  Gerald Pfeifer  

* make-temp-file.c (usrtmp): Remove.
(choose_tmpdir): Remove use of usrtmp.

diff --git a/libiberty/ChangeLog b/libiberty/ChangeLog
index 1c9138861bd..2f8390cc63a 100644
--- a/libiberty/ChangeLog
+++ b/libiberty/ChangeLog
@@ -1,3 +1,8 @@
+2021-06-13  Gerald Pfeifer  
+
+   * make-temp-file.c (usrtmp): Remove.
+   (choose_tmpdir): Remove use of usrtmp.
+
 2021-06-05  John David Anglin  
 
PR target/100734
diff --git a/libiberty/make-temp-file.c b/libiberty/make-temp-file.c
index 7465cec5ea6..cad0645619e 100644
--- a/libiberty/make-temp-file.c
+++ b/libiberty/make-temp-file.c
@@ -81,8 +81,6 @@ try_dir (const char *dir, const char *base)
 }
 
 static const char tmp[] = { DIR_SEPARATOR, 't', 'm', 'p', 0 };
-static const char usrtmp[] =
-{ DIR_SEPARATOR, 'u', 's', 'r', DIR_SEPARATOR, 't', 'm', 'p', 0 };
 static const char vartmp[] =
 { DIR_SEPARATOR, 'v', 'a', 'r', DIR_SEPARATOR, 't', 'm', 'p', 0 };
 
@@ -131,7 +129,6 @@ choose_tmpdir (void)
 
   /* Try /var/tmp, /usr/tmp, then /tmp.  */
   base = try_dir (vartmp, base);
-  base = try_dir (usrtmp, base);
   base = try_dir (tmp, base);
   
   /* If all else fails, use the current directory!  */


Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]

2021-06-28 Thread Victor Tong via Gcc-patches
​Thanks Richard and Marc.

I wrote the following test case to compare the outputs of fn1() and fn1NoOpt() 
below with my extra pattern being applied. I tested the two functions with all 
of the integers from INT_MIN to INT_MAX.

long
fn1 (int x)
{
  return 42L - (long)(42 - x);
}

#pragma GCC push_options
#pragma GCC optimize ("O0")
long
fn1NoOpt (int x)
{
  volatile int y = (42 - x);
  return 42L - (long)y;
}
#pragma GCC pop_options

int main ()
{
for (long i=INT_MIN; i<=INT_MAX;i++)
{
auto valNoOpt = fn1NoOpt(i);
auto valOpt = fn1(i);
if (valNoOpt != valOpt)
printf("valOpt=%ld, valNoOpt=%ld\n", valOpt, valNoOpt);
}
return 0;
}

I saw that the return values of fn1() and fn1NoOpt() differed when the input 
was between INT_MIN and INT_MIN+42 inclusive. When passing values in this range 
to fn1NoOpt(), a signed overflow is triggered which causes the value to differ 
(undefined behavior). This seems to go in line with what Marc described and I 
think the transformation is correct in the scenario above. I do think that type 
casts that result in truncation (i.e. from a higher precision to a lower one) 
or with unsigned types will result in an incorrect transformation so those 
scenarios need to be avoided.

Given that the extra pattern I'm adding is taking advantage the undefined 
behavior of signed integer overflow, I'm considering keeping the existing 
nop_convert pattern in place and adding a new pattern to cover these new cases. 
I'd also like to avoid touching nop_convert given that it's used in a number of 
other patterns.

This is the pattern I have currently:

  (simplify
    (minus (convert1? @0) (convert2? (minus (convert3? @2) @1)))
    (if (operand_equal_p(@0, @2, 0)
        && INTEGRAL_TYPE_P (type)
        && TYPE_OVERFLOW_UNDEFINED(type)
        && !TYPE_OVERFLOW_SANITIZED(type)
        && INTEGRAL_TYPE_P (TREE_TYPE(@1))
        && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@1))
        && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@1))
        && !TYPE_UNSIGNED (TREE_TYPE (@1))
        && !TYPE_UNSIGNED (type)
        && TYPE_PRECISION (TREE_TYPE (@1)) <= TYPE_PRECISION (type)
        && INTEGRAL_TYPE_P (TREE_TYPE(@0))
        && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@0))
        && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@0))
        && !TYPE_UNSIGNED (TREE_TYPE (@0))
        && TYPE_PRECISION (TREE_TYPE (@0)) <= TYPE_PRECISION (type)
        && TREE_TYPE(@1) == TREE_TYPE(@2))
    (convert @1)))

Is there a more concise/better way of writing the pattern? I was looking for 
similar checks in match.pd and I couldn't find anything that I could leverage.

I also kept my pattern to the specific scenario I'm seeing with the regression 
to lower the risk of something breaking. I've limited @1 and @2 to have the 
same type.

I'm also in favor of adding/running computer verification to make sure the 
transformation is legal. I've written some tests to verify that the pattern is 
being applied in the right scenarios and not being applied in others, but I 
think there are too many possibilities to manually write them all. Is there 
anything in GCC that can be used to verify that match.pd transformations are 
correct? I'm thinking of something like Alive 
https://github.com/AliveToolkit/alive2.

Thanks,
Victor



From: Richard Biener 
Sent: Monday, June 21, 2021 12:08 AM
To: Marc Glisse 
Cc: Victor Tong ; gcc-patches@gcc.gnu.org 

Subject: Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division 
followed by multiply [PR95176] 
 
On Sat, Jun 19, 2021 at 7:05 PM Marc Glisse  wrote:
>
> On Fri, 18 Jun 2021, Richard Biener wrote:
>
> >> Option 2: Add a new pattern to support scenarios that the existing 
> >> nop_convert pattern bails out on.
> >>
> >> Existing pattern:
> >>
> >> (simplify
> >>    (minus (nop_convert1? @0) (nop_convert2? (minus (nop_convert3? @@0) 
> >>@1)))
> >>    (view_convert @1))
>
> I tried to check with a program when
>
> T3 x;
> T1 y;
> (T2)x-(T2)((T1)x-y)
>
> can be safely replaced with
>
> (T2)y
>
> From the output, it looks like this is safe when T1 is at least as large
> as T2. It is wrong when T1 is unsigned and smaller than T2. And when T1 is
> signed and smaller than T2, it is ok if T3 is the same type as T1 (signed
> then) or has strictly less precision (any sign), and not in other cases.
>
> Note that this is when signed implies undefined overflow and unsigned
> implies wrapping, and I wouldn't put too much faith in this recently
> dusted program. And it doesn't say how to write the match.pd pattern with
> '?', "@@", disabling it if TYPE_OVERFLOW_SANITIZED, etc.
>
> Mostly, I wanted to say that if we are going to go handle more than
> nop_convert for more than just 1 or 2 easy transformations, I think some
> kind of computer verification would be useful, it would save a lot of time
> and headaches.

True.  I wonder if auto-generating such tests from match.pd rules would
be a good project to w

Re: [PATCH 0/2] Ranger-based backwards threader implementation.

2021-06-28 Thread Martin Sebor via Gcc-patches

On 6/28/21 10:21 AM, Aldy Hernandez via Gcc-patches wrote:

This is the ranger-based backwards threader.  It is divided into two
parts: the solver and the path discovery bits.

The solver is generic enough, that it may be of use to other passes,
so it's been abstracted into its own separate class/file.  Andrew and
I have already gone over it, so I don't think a review is necessary.
Besides, it's technically an extension of the ranger infrastructure.

On the other hand, the path discovery bits could benefit from the
watchful eye of the jump threading experts.

Documenting the solver in a [ranger-tech] post is on my TODO list,
as I think it would be useful as an example of GORI as a general
tool, outside the VRP world.

As I have mentioned elsewhere, I have gone through each test and
documented the reasons why they were adjusted (when useful).  The
reviewer(s) may benefit from looking at the test notes.

I have added a --param=threader-mode={ranger,legacy} option, which I
hope to remove shortly after.  It has been useful for diagnosing
issues in the past, though perhaps not so much now.  I've left it
in case there's a remote interest in using it during stage1, but
removing it could be a huge cleanup to tree-ssa-threadbackward.c.

If/when accepted, I will open 2-3 PRs with the XFAILed tests as
requested.  I am still working on distilling a C counterpart for
the libphobos missing thread edge.  It'll hopefully be ready by the
time the review is done.

A version of this patchset with the verification code has
been tested on x86-64, ppc64, ppc64le, and aarch64 (all Linux).

I am currently re-testing on x86-64 Linux, but will not re-test on the
rest of the architectures because...OMG aarch6 is so slow!


I applied the series and ran a subset of tests and didn't see any
failures, just the three XPASSes below.  The Wfree-nonheap-object
tests you mentioned in the other post all pass.  Looks like you
got past that problem?

XPASS: gcc.dg/uninit-pr61112.c pr61112 (test for bogus messages, line 32)
XPASS: gcc.dg/uninit-pr61112.c pr61112 (test for bogus messages, line 46)
XPASS: gcc.dg/uninit-pr61112.c pr61112 (test for bogus messages, line 60)

A couple of comments on the tests below (I haven't looked at the meat
of the patch):



Thanks.
Aldy

Aldy Hernandez (2):
   Implement basic block path solver.
   Backwards jump threader rewrite with ranger.

  gcc/Makefile.in   |   6 +
  gcc/flag-types.h  |   7 +
  gcc/params.opt|  17 +
  .../g++.dg/debug/dwarf2/deallocator.C |   3 +-
  gcc/testsuite/gcc.c-torture/compile/pr83510.c |  33 ++
  gcc/testsuite/gcc.dg/Wrestrict-22.c   |   3 +


The change here just adds the comment:

+/* This looks like the threader caused the entire loop to collapse, and the
+   warning pass can't determine the arguments to memcpy.  */
+

Since the test passes I'm not sure I understand what the comment
is trying to say.  Is it still accurate and necessary?


  gcc/testsuite/gcc.dg/loop-unswitch-2.c|   2 +-
  gcc/testsuite/gcc.dg/old-style-asm-1.c|   5 +-
  gcc/testsuite/gcc.dg/pr68317.c|   4 +-
  gcc/testsuite/gcc.dg/pr97567-2.c  |   2 +-
  gcc/testsuite/gcc.dg/predict-9.c  |   4 +-
  gcc/testsuite/gcc.dg/shrink-wrap-loop.c   |  53 ++
  gcc/testsuite/gcc.dg/sibcall-1.c  |  10 +
  .../gcc.dg/tree-ssa/builtin-sprintf-3.c   |   5 +-


I wonder if breaking up the test function into five, one for each
of the tests it does, would be a better way to avoid the IL changes
than disabling all the threading passes.  Like in the attached patch.

Martin


  gcc/testsuite/gcc.dg/tree-ssa/pr21001.c   |   1 +
  gcc/testsuite/gcc.dg/tree-ssa/pr21294.c   |   1 +
  gcc/testsuite/gcc.dg/tree-ssa/pr21417.c   |   2 +-
  gcc/testsuite/gcc.dg/tree-ssa/pr21458-2.c |   2 +-
  gcc/testsuite/gcc.dg/tree-ssa/pr21563.c   |   2 +-
  gcc/testsuite/gcc.dg/tree-ssa/pr49039.c   |   2 +-
  gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c |   2 +-
  gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c |   2 +-
  gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c |   2 +-
  .../gcc.dg/tree-ssa/ranger-threader-1.c   |  20 +
  .../gcc.dg/tree-ssa/ranger-threader-2.c   |  39 ++
  .../gcc.dg/tree-ssa/ranger-threader-3.c   |  41 ++
  .../gcc.dg/tree-ssa/ranger-threader-4.c   |  83 +++
  gcc/testsuite/gcc.dg/tree-ssa/split-path-4.c  |   4 +-
  .../gcc.dg/tree-ssa/ssa-dom-thread-11.c   |   2 +-
  .../gcc.dg/tree-ssa/ssa-dom-thread-12.c   |   2 +-
  .../gcc.dg/tree-ssa/ssa-dom-thread-14.c   |   1 +
  .../gcc.dg/tree-ssa/ssa-dom-thread-18.c   |   5 +-
  .../gcc.dg/tree-ssa/ssa-dom-thread-6.c|   4 +-
  .../gcc.dg/tree-ssa/ssa-dom-thread-7.c|   1 +
  gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-48.c|   2 +-
  gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-11.c |   1 +
  gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-12.c |   2 +-

[committed] analyzer: introduce byte_range and use to simplify dumps

2021-06-28 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as 7c6b354b92b38f31cd2399fbdbc9d6f837881480.

gcc/analyzer/ChangeLog:
* analyzer.h (byte_offset_t): New typedef.
* store.cc (bit_range::dump_to_pp): Dump as a byte range if
possible.
(bit_range::as_byte_range): New.
(byte_range::dump_to_pp): New.
* store.h (class byte_range): New forward decl.
(struct bit_range): Add comment.
(bit_range::as_byte_range): New decl.
(struct byte_range): New.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/analyzer.h |  1 +
 gcc/analyzer/store.cc   | 54 -
 gcc/analyzer/store.h| 25 +++
 3 files changed, 74 insertions(+), 6 deletions(-)

diff --git a/gcc/analyzer/analyzer.h b/gcc/analyzer/analyzer.h
index 525eb06c3b5..f06b68c1814 100644
--- a/gcc/analyzer/analyzer.h
+++ b/gcc/analyzer/analyzer.h
@@ -142,6 +142,7 @@ public:
 
 typedef offset_int bit_offset_t;
 typedef offset_int bit_size_t;
+typedef offset_int byte_offset_t;
 typedef offset_int byte_size_t;
 
 extern bool int_size_in_bits (const_tree type, bit_size_t *out);
diff --git a/gcc/analyzer/store.cc b/gcc/analyzer/store.cc
index 320370326bd..d5f879835a0 100644
--- a/gcc/analyzer/store.cc
+++ b/gcc/analyzer/store.cc
@@ -241,12 +241,18 @@ binding_key::cmp (const binding_key *k1, const 
binding_key *k2)
 void
 bit_range::dump_to_pp (pretty_printer *pp) const
 {
-  pp_string (pp, "start: ");
-  pp_wide_int (pp, m_start_bit_offset, SIGNED);
-  pp_string (pp, ", size: ");
-  pp_wide_int (pp, m_size_in_bits, SIGNED);
-  pp_string (pp, ", next: ");
-  pp_wide_int (pp, get_next_bit_offset (), SIGNED);
+  byte_range bytes (0, 0);
+  if (as_byte_range (&bytes))
+bytes.dump_to_pp (pp);
+  else
+{
+  pp_string (pp, "start: ");
+  pp_wide_int (pp, m_start_bit_offset, SIGNED);
+  pp_string (pp, ", size: ");
+  pp_wide_int (pp, m_size_in_bits, SIGNED);
+  pp_string (pp, ", next: ");
+  pp_wide_int (pp, get_next_bit_offset (), SIGNED);
+}
 }
 
 /* Dump this object to stderr.  */
@@ -329,6 +335,42 @@ bit_range::from_mask (unsigned HOST_WIDE_INT mask, 
bit_range *out)
   return true;
 }
 
+/* Attempt to convert this bit_range to a byte_range.
+   Return true if it is possible, writing the result to *OUT.
+   Otherwise return false.  */
+
+bool
+bit_range::as_byte_range (byte_range *out) const
+{
+  if (m_start_bit_offset % BITS_PER_UNIT == 0
+  && m_size_in_bits % BITS_PER_UNIT == 0)
+{
+  out->m_start_byte_offset = m_start_bit_offset / BITS_PER_UNIT;
+  out->m_size_in_bytes = m_size_in_bits / BITS_PER_UNIT;
+  return true;
+}
+  return false;
+}
+
+/* Dump this object to PP.  */
+
+void
+byte_range::dump_to_pp (pretty_printer *pp) const
+{
+  if (m_size_in_bytes == 1)
+{
+  pp_string (pp, "byte ");
+  pp_wide_int (pp, m_start_byte_offset, SIGNED);
+}
+  else
+{
+  pp_string (pp, "bytes ");
+  pp_wide_int (pp, m_start_byte_offset, SIGNED);
+  pp_string (pp, "-");
+  pp_wide_int (pp, get_last_byte_offset (), SIGNED);
+}
+}
+
 /* class concrete_binding : public binding_key.  */
 
 /* Implementation of binding_key::dump_to_pp vfunc for concrete_binding.  */
diff --git a/gcc/analyzer/store.h b/gcc/analyzer/store.h
index ca9ff696bca..e0c60e128fa 100644
--- a/gcc/analyzer/store.h
+++ b/gcc/analyzer/store.h
@@ -196,6 +196,7 @@ private:
   hash_set m_mutable_at_unknown_call_svals;
 };
 
+class byte_range;
 class concrete_binding;
 
 /* An enum for discriminating between "direct" vs "default" levels of
@@ -267,6 +268,8 @@ private:
   enum binding_kind m_kind;
 };
 
+/* A concrete range of bits.  */
+
 struct bit_range
 {
   bit_range (bit_offset_t start_bit_offset, bit_size_t size_in_bits)
@@ -308,10 +311,32 @@ struct bit_range
 
   static bool from_mask (unsigned HOST_WIDE_INT mask, bit_range *out);
 
+  bool as_byte_range (byte_range *out) const;
+
   bit_offset_t m_start_bit_offset;
   bit_size_t m_size_in_bits;
 };
 
+/* A concrete range of bytes.  */
+
+struct byte_range
+{
+  byte_range (byte_offset_t start_byte_offset, byte_size_t size_in_bytes)
+  : m_start_byte_offset (start_byte_offset),
+m_size_in_bytes (size_in_bytes)
+  {}
+
+  void dump_to_pp (pretty_printer *pp) const;
+
+  byte_offset_t get_last_byte_offset () const
+  {
+return m_start_byte_offset + m_size_in_bytes - 1;
+  }
+
+  byte_offset_t m_start_byte_offset;
+  byte_size_t m_size_in_bytes;
+};
+
 /* Concrete subclass of binding_key, for describing a concrete range of
bits within the binding_map (e.g. "bits 8-15").  */
 
-- 
2.26.3



Re: [PATCH v5 1/2] x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast

2021-06-28 Thread H.J. Lu via Gcc-patches
On Sun, Jun 27, 2021 at 6:43 PM Hongtao Liu  wrote:
>
> On Sun, Jun 27, 2021 at 4:02 AM H.J. Lu  wrote:
> >
> > 1. Update move expanders to convert the CONST_WIDE_INT and CONST_VECTO
> > operands to vector broadcast from an integer with AVX2.
> > 2. Add ix86_gen_scratch_sse_rtx to return a scratch SSE register which
> > won't increase stack alignment requirement and blocks transformation by
> > the combine pass.
> >
> > A small benchmark:
> >
> > https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/memset/broadcast
> >
> > shows that broadcast is a little bit faster on Intel Core i7-8559U:
> >
> > $ make
> > gcc -g -I. -O2   -c -o test.o test.c
> > gcc -g   -c -o memory.o memory.S
> > gcc -g   -c -o broadcast.o broadcast.S
> > gcc -g   -c -o vec_dup_sse2.o vec_dup_sse2.S
> > gcc -o test test.o memory.o broadcast.o vec_dup_sse2.o
> > ./test
> > memory  : 147215
> > broadcast   : 121213
> > vec_dup_sse2: 171366
> > $
> >
> > broadcast is also smaller:
> >
> > $ size memory.o broadcast.o
> >textdata bss dec hex filename
> > 132   0   0 132  84 memory.o
> > 122   0   0 122  7a broadcast.o
> > $
> >
> > 3. Update PR 87767 tests to expect integer broadcast instead of broadcast
> > from memory.
> > 4. Update avx512f_cond_move.c to expect integer broadcast.
> >
> > A small benchmark:
> >
> > https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/vpaddd/broadcast
> >
> > shows that integer broadcast is faster than embedded memory broadcast:
> >
> > $ make
> > gcc -g -I. -O2 -march=skylake-avx512   -c -o test.o test.c
> > gcc -g   -c -o memory.o memory.S
> > gcc -g   -c -o broadcast.o broadcast.S
> > gcc -o test test.o memory.o broadcast.o
> > ./test
> > memory  : 425538
> > broadcast   : 375260
> > $
> >
> > gcc/
> >
> > PR target/100865
> > * config/i386/i386-expand.c (ix86_expand_vector_init_duplicate):
> > New prototype.
> > (ix86_byte_broadcast): New function.
> > (ix86_convert_const_wide_int_to_broadcast): Likewise.
> > (ix86_expand_move): Convert CONST_WIDE_INT to broadcast if mode
> > size is 16 bytes or bigger.
> > (ix86_broadcast_from_integer_constant): New function.
> > (ix86_expand_vector_move): Convert CONST_WIDE_INT and CONST_VECTOR
> > to broadcast if mode size is 16 bytes or bigger.
> > * config/i386/i386-protos.h (ix86_gen_scratch_sse_rtx): New
> > prototype.
> > * config/i386/i386.c (ix86_gen_scratch_sse_rtx): New function.
> >
> > gcc/testsuite/
> >
> > PR target/100865
> > * gcc.target/i386/avx512f-broadcast-pr87767-1.c: Expect integer
> > broadcast.
> > * gcc.target/i386/avx512f-broadcast-pr87767-5.c: Likewise.
> > * gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Likewise.
> > * gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Likewise.
> > * gcc.target/i386/avx512f_cond_move.c: Also pass
> > -mprefer-vector-width=512 and expect integer broadcast.
> > * gcc.target/i386/pr100865-1.c: New test.
> > * gcc.target/i386/pr100865-2.c: Likewise.
> > * gcc.target/i386/pr100865-3.c: Likewise.
> > * gcc.target/i386/pr100865-4a.c: Likewise.
> > * gcc.target/i386/pr100865-4b.c: Likewise.
> > * gcc.target/i386/pr100865-5a.c: Likewise.
> > * gcc.target/i386/pr100865-5b.c: Likewise.
> > * gcc.target/i386/pr100865-6a.c: Likewise.
> > * gcc.target/i386/pr100865-6b.c: Likewise.
> > * gcc.target/i386/pr100865-6c.c: Likewise.
> > * gcc.target/i386/pr100865-7a.c: Likewise.
> > * gcc.target/i386/pr100865-7b.c: Likewise.
> > * gcc.target/i386/pr100865-7c.c: Likewise.
> > * gcc.target/i386/pr100865-8a.c: Likewise.
> > * gcc.target/i386/pr100865-8b.c: Likewise.
> > * gcc.target/i386/pr100865-9a.c: Likewise.
> > * gcc.target/i386/pr100865-9b.c: Likewise.
> > * gcc.target/i386/pr100865-10a.c: Likewise.
> > * gcc.target/i386/pr100865-10b.c: Likewise.
> > * gcc.target/i386/pr100865-11a.c: Likewise.
> > * gcc.target/i386/pr100865-11b.c: Likewise.
> > * gcc.target/i386/pr100865-12a.c: Likewise.
> > * gcc.target/i386/pr100865-12b.c: Likewise.
> > ---
> >  gcc/config/i386/i386-expand.c | 190 --
> >  gcc/config/i386/i386-protos.h |   2 +
> >  gcc/config/i386/i386.c|  13 ++
> >  .../i386/avx512f-broadcast-pr87767-1.c|   7 +-
> >  .../i386/avx512f-broadcast-pr87767-5.c|   5 +-
> >  .../gcc.target/i386/avx512f_cond_move.c   |   4 +-
> >  .../i386/avx512vl-broadcast-pr87767-1.c   |  12 +-
> >  .../i386/avx512vl-broadcast-pr87767-5.c   |   9 +-
> >  gcc/testsuite/gcc.target/i386/pr100865-1.c|  13 ++
> >  gcc/testsuite/gcc.target/i386/pr100865-10a.c  |  33 +++
> >  gcc/testsuite/gcc.target/i386/pr100865-10b.c  |   7 +

Re: [COMMITTED V10 3/7] CTF/BTF debug formats

2021-06-28 Thread David Edelsohn via Gcc-patches
bootstrap: Include tm_p.h in btfout.c and ctfout.c.

btfout.c and ctfout.c reference target-specific macros that
may reference target-specific functions that are declared in a
target-specific header.  tm_p.h must be included to access the
target-specific header.

Bootstrapped on powerpc-ibm-aix7.2.3.0.  Committed as obvious.

Thanks, David

gcc/ChangeLog:

* btfout.c: Include tm_p.h
* ctfout.c: Same.

diff --git a/gcc/btfout.c b/gcc/btfout.c
index 45954b4b7b9..2316dea5f27 100644
--- a/gcc/btfout.c
+++ b/gcc/btfout.c
@@ -26,6 +26,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "system.h"
 #include "coretypes.h"
 #include "target.h"
+#include "tm_p.h"
 #include "output.h"
 #include "dwarf2asm.h"
 #include "debug.h"
diff --git a/gcc/ctfout.c b/gcc/ctfout.c
index c264fd6661a..71d7a62e6ef 100644
--- a/gcc/ctfout.c
+++ b/gcc/ctfout.c
@@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "system.h"
 #include "coretypes.h"
 #include "target.h"
+#include "tm_p.h"
 #include "output.h"
 #include "dwarf2asm.h"
 #include "debug.h"


[PATCH] The upper bits of FIXUPIMMS{S, D} should come from src1 not dest.

2021-06-28 Thread liuhongt via Gcc-patches
Hi:
  Currently patterns of vfixupimm{s,d} keep the upper bits of dest unchanged 
which
is wrong, the upper bits of the dest should comes from src1(operands[2] in the 
pattern).

  Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.

gcc/ChangeLog:

PR target/101248
* config/i386/sse.md
(avx512f_sfixupimm):
Refined.
(avx512f_sfixupimm):
Ditto.
* config/i386/subst.md (maskz_scalar): New define_subst.
(maskz_scalar_name): New subst_attr.
(maskz_scalar_op5): Ditto.
(round_saeonly_maskz_scalar_op5): Ditto.
(round_saeonly_maskz_scalar_operand5): Ditto.

gcc/testsuite/ChangeLog

PR target/101248
* gcc.target/i386/pr101248.c: New test.
---
 gcc/config/i386/sse.md   |   8 +-
 gcc/config/i386/subst.md |  21 
 gcc/testsuite/gcc.target/i386/pr101248.c | 123 +++
 3 files changed, 148 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr101248.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index ffcc0c81964..d3f5a74f763 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -9942,7 +9942,7 @@
   DONE;
 })
 
-(define_insn "avx512f_sfixupimm"
+(define_insn "avx512f_sfixupimm"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
(vec_merge:VF_128
   (unspec:VF_128
@@ -9951,10 +9951,10 @@
 (match_operand: 3 
"" "")
 (match_operand:SI 4 "const_0_to_255_operand")]
UNSPEC_FIXUPIMM)
- (match_dup 1)
+ (match_dup 2)
  (const_int 1)))]
"TARGET_AVX512F"
-   "vfixupimm\t{%4, %3, %2, 
%0|%0, %2, %3, %4}";
+   "vfixupimm\t{%4, %3, 
%2, %0|%0, %2, 
%3, %4}";
[(set_attr "prefix" "evex")
(set_attr "mode" "")])
 
@@ -9968,7 +9968,7 @@
(match_operand: 3 
"" "")
(match_operand:SI 4 "const_0_to_255_operand")]
   UNSPEC_FIXUPIMM)
-   (match_dup 1)
+   (match_dup 2)
(const_int 1))
  (match_dup 1)
  (match_operand: 5 "register_operand" "Yk")))]
diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md
index 477a89803fa..6614e044857 100644
--- a/gcc/config/i386/subst.md
+++ b/gcc/config/i386/subst.md
@@ -117,6 +117,25 @@
 (match_operand: 3 "register_operand" "Yk")))
 ])
 
+(define_subst_attr "maskz_scalar_name" "maskz_scalar" "" "_maskz_1")
+(define_subst_attr "maskz_scalar_op5" "maskz_scalar" "" "%{%6%}%N5")
+
+(define_subst "maskz_scalar"
+  [(set (match_operand:SUBST_V 0)
+   (vec_merge:SUBST_V
+ (match_operand:SUBST_V 1)
+ (match_operand:SUBST_V 2)
+ (const_int 1)))]
+  "TARGET_AVX512F"
+  [(set (match_dup 0)
+   (vec_merge:SUBST_V
+ (vec_merge:SUBST_V
+   (match_dup 1)
+   (match_operand:SUBST_V 3 "const0_operand" "C")
+   (match_operand: 4 "register_operand" "Yk"))
+ (match_dup 2)
+ (const_int 1)))])
+
 (define_subst_attr "round_name" "round" "" "_round")
 (define_subst_attr "round_mask_operand2" "mask" "%R2" "%R4")
 (define_subst_attr "round_mask_operand3" "mask" "%R3" "%R5")
@@ -163,6 +182,7 @@
 (define_subst_attr "round_saeonly_mask_operand3" "mask" "%r3" "%r5")
 (define_subst_attr "round_saeonly_mask_operand4" "mask" "%r4" "%r6")
 (define_subst_attr "round_saeonly_mask_scalar_merge_operand4" 
"mask_scalar_merge" "%r4" "%r5")
+(define_subst_attr "round_saeonly_maskz_scalar_operand5" "maskz_scalar" "%r5" 
"%r7")
 (define_subst_attr "round_saeonly_sd_mask_operand5" "sd" "%r5" "%r7")
 (define_subst_attr "round_saeonly_op2" "round_saeonly" "" "%r2")
 (define_subst_attr "round_saeonly_op3" "round_saeonly" "" "%r3")
@@ -175,6 +195,7 @@
 (define_subst_attr "round_saeonly_mask_op4" "round_saeonly" "" 
"")
 (define_subst_attr "round_saeonly_mask_scalar_merge_op4" "round_saeonly" "" 
"")
 (define_subst_attr "round_saeonly_sd_mask_op5" "round_saeonly" "" 
"")
+(define_subst_attr "round_saeonly_maskz_scalar_op5" "round_saeonly" "" 
"")
 (define_subst_attr "round_saeonly_mask_arg3" "round_saeonly" "" ", 
operands[]")
 (define_subst_attr "round_saeonly_constraint" "round_saeonly" "vm" "v")
 (define_subst_attr "round_saeonly_constraint2" "round_saeonly" "m" "v")
diff --git a/gcc/testsuite/gcc.target/i386/pr101248.c 
b/gcc/testsuite/gcc.target/i386/pr101248.c
new file mode 100644
index 000..f5ac94f5769
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr101248.c
@@ -0,0 +1,123 @@
+/* PR target/101248  */
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512vl -std=gnu99" } */
+/* { dg-require-effective-target avx512vl } */
+/* { dg-require-effective-target c99_runtime } */
+
+#define AVX512VL
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 64)
+#include "avx512f-mask-type.h"
+#include "math_m_pi.h"
+#include "float.h"
+
+
+static void
+CALC (double *r, double dest, double src, long long tb

[PATCH v3] fixinc: don't "fix" machine names in __has_include(...) [PR91085]

2021-06-28 Thread Xi Ruoyao via Gcc-patches
v3:
  use memmem/memchr instead of trivial loops
  split most of the logic into a static function
  avoid hardcoded magic number
  adjust test

fixincludes/

* fixfixes.c (check_has_inc): New static function.
  (machine_name_fix): Don't replace header names in
  __has_include(...).
* inclhack.def (machine_name): Adjust test.
* tests/base/testing.h: Update.
---
 fixincludes/fixfixes.c   | 45 ++--
 fixincludes/inclhack.def |  3 ++-
 fixincludes/tests/base/testing.h |  2 +-
 3 files changed, 46 insertions(+), 4 deletions(-)

diff --git a/fixincludes/fixfixes.c b/fixincludes/fixfixes.c
index 5b23a8b640d..404b420f302 100644
--- a/fixincludes/fixfixes.c
+++ b/fixincludes/fixfixes.c
@@ -477,6 +477,39 @@ FIX_PROC_HEAD( char_macro_def_fix )
   fputs (text, stdout);
 }
 
+/* Check if the pattern at pos is actually in a "__has_include(...)"
+   directive.  Return the pointer to the ')' of this
+   "__has_include(...)" if it is, NULL otherwise.  */
+static const char *
+check_has_inc (const char *begin, const char *pos, const char *end)
+{
+  static const char has_inc[] = "__has_include";
+  const size_t has_inc_len = sizeof (has_inc) - 1;
+  const char *p;
+
+  for (p = memmem (begin, pos - begin, has_inc, has_inc_len);
+   p != NULL;
+   p = memmem (p, pos - p, has_inc, has_inc_len))
+{
+  p += has_inc_len;
+  while (p < end && ISSPACE (*p))
+p++;
+
+  /* "__has_include" may appear as "defined(__has_include)",
+ search for the next appearance then.  */
+  if (*p != '(')
+continue;
+
+  /* To avoid too much complexity, just hope there is never a
+ ')' in a header name.  */
+  p = memchr (p, ')', end - p);
+  if (p == NULL || p > pos)
+return p;
+}
+
+  return NULL;
+}
+
 /* Fix for machine name #ifdefs that are not in the namespace reserved
by the C standard.  They won't be defined if compiling with -ansi,
and the headers will break.  We go to some trouble to only change
@@ -524,7 +557,7 @@ FIX_PROC_HEAD( machine_name_fix )
   /* If the 'name_pat' matches in between base and limit, we have
  a bogon.  It is not worth the hassle of excluding comments
  because comments on #if/#ifdef lines are rare, and strings on
- such lines are illegal.
+ such lines are only legal in a "__has_include" directive.
 
  REG_NOTBOL means 'base' is not at the beginning of a line, which
  shouldn't matter since the name_re has no ^ anchor, but let's
@@ -544,8 +577,16 @@ FIX_PROC_HEAD( machine_name_fix )
 break;
 
   p = base + match[0].rm_so;
-  base += match[0].rm_eo;
 
+  /* Check if the match is in __has_include(...) (PR 91085). */
+  q = check_has_inc (base, p, limit);
+  if (q) 
+{
+  base = q + 1;
+  goto again;
+}
+
+  base += match[0].rm_eo;
   /* One more test: if on the same line we have the same string
  with the appropriate underscores, then leave it alone.
  We want exactly two leading and trailing underscores.  */
diff --git a/fixincludes/inclhack.def b/fixincludes/inclhack.def
index 3a4cfe06542..4db311713ef 100644
--- a/fixincludes/inclhack.def
+++ b/fixincludes/inclhack.def
@@ -3151,7 +3151,8 @@ fix = {
 c_fix = machine_name;
 
 test_text = "/* MACH_DIFF: */\n"
-"#if defined( i386 ) || defined( sparc ) || defined( vax )"
+"#if defined( i386 ) || defined( sparc ) || defined( vax ) || "
+"defined( linux ) || __has_include (  )"
 "\n/* no uniform test, so be careful  :-) */";
 };
 
diff --git a/fixincludes/tests/base/testing.h b/fixincludes/tests/base/testing.h
index cf95321fb86..8b3accaf04e 100644
--- a/fixincludes/tests/base/testing.h
+++ b/fixincludes/tests/base/testing.h
@@ -64,7 +64,7 @@ BSD43__IOWR('T', 1) /* Some are multi-line */
 
 #if defined( MACHINE_NAME_CHECK )
 /* MACH_DIFF: */
-#if defined( i386 ) || defined( sparc ) || defined( vax )
+#if defined( i386 ) || defined( sparc ) || defined( vax ) || defined( linux ) 
|| __has_include (  )
 /* no uniform test, so be careful  :-) */
 #endif  /* MACHINE_NAME_CHECK */
 
-- 
2.32.0