Re: Byte swapping support

2017-09-15 Thread Florian Weimer
* Eric Botcazou:

>> To support applications that assume big-endian memory layout on little-
>> endian systems, I'm considering adding support for reversing the
>> storage order to GCC.
>
> That was also the goal of the scalar_storage_order attribute.

By the way, what happened to the C++ bits?  I think the c-family patch
which went in assumes that the C++ bits are there as well.


Re: Byte swapping support

2017-09-15 Thread Eric Botcazou
> By the way, what happened to the C++ bits?  I think the c-family patch
> which went in assumes that the C++ bits are there as well.

I don't really understand what you mean by "assume" here, but the C++ bits 
were incomplete and never got reviewed; I can resurrect them if there is some 
interest though.

-- 
Eric Botcazou


Re: Infering that the condition of a for loop is initially true?

2017-09-15 Thread Richard Biener
On Fri, Sep 15, 2017 at 12:07 AM, Jeff Law  wrote:
> On 09/14/2017 01:28 PM, Niels Möller wrote:
>> This is more of a question than a bug report, so I'm trying to send it
>> to the list rather than filing a bugzilla issue.
>>
>> I think it's quite common to write for- and while-loops where the
>> condition is always initially true. A simple example might be
>>
>> double average (const double *a, size_t n)
>> {
>>   double sum;
>>   size_t i;
>>
>>   assert (n > 0);
>>   for (i = 0, sum = 0; i < n; i++)
>> sum += a[i];
>>   return sum / n;
>> }
>>
>> The programmer could do the microptimization to rewrite it as a
>> do-while-loop instead. It would be nice if gcc could infer that the
>> condition is initially true, and convert to a do-while loop
>> automatically.
>>
>> Converting to a do-while-loop should produce slightly better code,
>> omitting the typical jump to enter the loop at the end where the
>> condition is checked. It would also make analysis of where variables are
>> written more accurate, which is my main concern at the moment.
>>
>> My questions are:
>>
>> 1. Does gcc attempt to do this optimization?
> Yes.  It happens as a side effect of jump threading and there are also
> dedicated passes to rotate the loop.

The loop header copying does this.

>>
>> 2. If it does, how often does it succeed on loops in real programs?
> Often.  The net benefit is actually small though and sometimes this kind
> of loop rotation can impede vectorization.

Most loop optimizers do not deal well with loops that may not execute
or rather they do not handle number of iterations being computed as
"n or zero".  The vectorizer is an exception to this as it will simply
version the loop with the extra condition(s).

>
>>
>> 3. Can I help the compiler to do that inference?
> In general, I'd advise against it.  You end up with ugly code which
> works with specific versions of the compiler, but which needs regular
> tweaking as the internal implementations of various optimizers change
> over time.
>
>
>>
>> The code I had some trouble with is at
>> https://git.lysator.liu.se/nettle/nettle/blob/master/ecc-mod.c. A
>> simplified version with only the interesting code path would be
>>
>> void
>> ecc_mod (mp_size_t mn, mp_size_t bn, mp_limb_t *rp)
>> {
>>   mp_limb_t hi;
>>   mp_size_t sn = mn - bn;
>>   mp_size_t rn = 2*mn;
>>
>>   assert (bn < mn);
>>
>>   while (rn >= 2 * mn - bn)
> In this particular case (ignoring the assert), what you want is better
> jump threading exploiting range propagation.   But you have to be real
> careful here due to the potential overflow.
>
> I'd have to have a self-contained example to dig into what's really
> going on, but my suspicion is either overflow or fairly weak range data
> and simplification due to the symbolic ranges.
>
> Jeff
>


Re: Byte swapping support

2017-09-15 Thread Florian Weimer
* Eric Botcazou:

>> By the way, what happened to the C++ bits?  I think the c-family patch
>> which went in assumes that the C++ bits are there as well.
>
> I don't really understand what you mean by "assume" here,

handle_pragma_scalar_storage_order does not check c_dialect_cxx, so it
will not issue a warning for C++ even though the pragma is effectively
ignored.


Re: Byte swapping support

2017-09-15 Thread Eric Botcazou
> handle_pragma_scalar_storage_order does not check c_dialect_cxx, so it
> will not issue a warning for C++ even though the pragma is effectively
> ignored.

Indeed, unlike for the attribute, will fix, thanks.

-- 
Eric Botcazou


Re: [RFC] type promotion pass

2017-09-15 Thread Wilco Dijkstra
Hi Prathamesh,

I've tried out the latest version and it works really well. It built and ran 
SPEC2017 without any issues or regressions (I didn't do a detailed comparison 
which would mean multiple runs, however a single run showed performance is 
pretty much the same on INT and 0.1% faster on FP). 

Codesize reduces in almost all cases (only xalancbmk increases by 600 bytes), 
sometimes by a huge amount. For example in gcc_r around 20% of all AND 
immediate instructions are removed, clear proof it removes many redundant 
zero/sign extensions.

So consider this a big +1 from me! GCC is behind other compilers with respect 
to this kind of optimization and it looks like this phase does a major catchup. 
Like I mentioned, it doesn't have to be 100% perfect, once it has been 
committed, we can fine tune it and add more optimizations.

Wilco


Re: [RFC] type promotion pass

2017-09-15 Thread David Edelsohn
On Tue, Sep 5, 2017 at 5:26 AM, Prathamesh Kulkarni
 wrote:
> Hi,
> I have attached revamped version of Kugan's original patch for type promotion
> (https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00472.html)
> rebased on r249469. The motivation of the pass is to minimize
> generation of subregs
> to avoid redundant zero/sign extensions by carrying out computations
> in PROMOTE_MODE
> as much as possible on tree-level.
>
> * Working of pass
> The pass is dominator-based, and tries to promote type of def to
> PROMOTE_MODE in each gimple stmt. Before beginning domwalk, all the
> default definitions are promoted to PROMOTE_MODE
> in promote_all_ssa_defined_with_nop (). The patch adds a new tree code
> SEXT_EXPR to represent sign-extension on tree level. CONVERT_EXPR is
> either replaced by an explicit sign or zero extension depending on the
> signedness of operands.
>
> The core of the pass is in following two routines:
> a) promote_ssa: This pass looks at the def of gimple_stmt, and
> promotes type of def to promoted_type in-place if possible. If it
> cannot promote def in-place, then
> it transforms:
> def = def_stmt
> to
> new_def = def_stmt
> def = convert_expr new_def
> where new_def is a clone of def, and type of def is set to promoted_type.
>
> b) fixup_use: The main intent is to "fix" uses of a promoted variable
> to preserve semantics
> of the code, for instance if the variable is used in stmt where it's
> original type is required.
> Another case is when def is not promoted by promote_ssa, but some uses
> could be promoted.
>
> promote_all_stmts () is the driver function that calls fixup_use and
> promote_ssa for each stmt
> within the basic block. The pass relies extensively on dom and vrp to
> remove redundancies generated by the pass and is thus enabled only if
> vrp is enabled.
>
> Issues:
> 1] Pass ordering: type-promote pass generates too many redundancies
> which can hamper other optimizations. One case I observed was on arm
> when it inhibited path splitting optimization because the number of
> stmts in basic block exceeded the value of
> param-max-jump-thread-duplication-stmts. So I placed the pass just
> before dom. I am not sure if this is a good approach. Maybe the pass
> itself
> should avoid generating redundant statements and not rely too much on
> dom and vrp ?
>
> 2] Redundant copies left after the pass: When it's safe, vrp optimzies
> _1 = op1 sext op2
> into
> _1 = op1
>
> which leaves redundant copies that are not optimized away at GIMPLE level.
> I placed pass_copy_prop after vrp to eliminate these copies but not sure if 
> that
> is ideal. Maybe we should do this during vrp itself as this is the
> only case that
> generates redundant copies ?
>
> 3] Backward phi's: Since we traverse in dominated order, fixup_use()
> gets called first on a ssa_name that is an argument of a backward-phi.
> If it promotes the type and later if promote_ssa() decides that the
> ssa_name should not be promoted, then we have to again walk the
> backward phi's to undo the "incorrect" promotion, which is done by
> emitting a convert_expr back to the original type from promoted_type.
> While I suppose it shouldn't affect correctness, it generates
> redundant casts and was wondering if there's a better approach to
> handle this issue.
>
> * SPEC2k6 benchmarking:
>
> Results of  benchmarking the patch for aarch64-linux-gnu cortex-a57:
>
> Improved:
> 401.bzip2  +2.11%
> 459.GemsFDTD  +1.42%
> 464.h264ref   +1.96%
> 471.omnetpp  +1.05%
> 481.wrf+0.99%
>
> Regressed:
> 447.dealII-1.34%
> 450.soplex  -1.54%
> 456.hmmer  -3.79%
> 482.sphinx3 -2.95%
>
> The remaining benchmarks didn't have much difference. However there
> was some noise
> in the above run, and I suppose the numbers are not precise.
> I will give another run with benchmarking. There was no significant
> difference in code-size with and without patch for SPEC2k6.
>
> * Validation
>
> The patch passes bootstrap+test on x86_64-unknown-linux-gnu,
> arm-linux-gnueabihf,
> aarch64-linux-gnu and ppc64le-linux-gnu. On arm-linux-gnueabihf, and
> aarch64-linux-gnu, there is following fallout:
>
> 1] gcc.dg/attr-alloc_size-11.c missing range info for signed char
> (test for warnings, line 50)
> The issue seems to be that we call reset_flow_sensitive_info(def) if
> def is promoted and that invalidates value range which is probably
> causing the regression.
>
> 2] gcc.dg/fold-narrowbopcst-1.c scan-tree-dump optimized " = _.* + 156"
> This requires adjusting the scan-dump.
>
> On ppc64le-linux-gnu, I am observing several regressions in the
> testsuite. Most of these seem to be because type-promotion is
> interfering with other optimizations, especially widening_mul and
> bswap pass. Also observed fallout for some cases due to interference
> with strlen, tail-call and jump-threading passes. I suppose I am not
> seeing these on arm/aarch64 because ppc64 defines
> PROMOTE_MODE to be DImode and

Re: [RFC] type promotion pass

2017-09-15 Thread Wilco Dijkstra
David Edelsohn wrote:

> Why does AArch64 define PROMOTE_MODE as SImode?  GCC ports for other
> RISC targets mostly seem to use a 64-bit mode.  Maybe SImode is the
> correct definition based on the current GCC optimization
> infrastructure, but this seems like a change that should be applied to
> all 64 bit RISC targets.

The reason is that AArch64 supports both 32-bit registers, so when using 
char/short
you want 32-bit operations. There is an issue in that WORD_REGISTER_OPERATIONS
isn't set on AArch64, but it should be. Maybe that requires some cleanups and 
ensure it
correctly interacts with PROMOTE_MODE. There are way too many confusing target
defines like this and no general mechanism that just works like you'd expect. 
Promoting
to an orthogonal set of registers is not something particularly unusual, so 
it's something
GCC should support well by default...

Wilco



Re: [RFC] type promotion pass

2017-09-15 Thread Jeff Law
On 09/15/2017 07:47 AM, Wilco Dijkstra wrote:
> David Edelsohn wrote:
> 
>> Why does AArch64 define PROMOTE_MODE as SImode?  GCC ports for other
>> RISC targets mostly seem to use a 64-bit mode.  Maybe SImode is the
>> correct definition based on the current GCC optimization
>> infrastructure, but this seems like a change that should be applied to
>> all 64 bit RISC targets.
> 
> The reason is that AArch64 supports both 32-bit registers, so when using 
> char/short
> you want 32-bit operations. There is an issue in that WORD_REGISTER_OPERATIONS
> isn't set on AArch64, but it should be. Maybe that requires some cleanups and 
> ensure it
> correctly interacts with PROMOTE_MODE. There are way too many confusing target
> defines like this and no general mechanism that just works like you'd expect. 
> Promoting
> to an orthogonal set of registers is not something particularly unusual, so 
> it's something
> GCC should support well by default...
Note this ties in directly with the conversation Steve and I have been
having in another thread.

WORD_REGISTER_OPERATIONS works with PROMOTE_MODE.  The reason you can't
define WORD_REGISTER_OPERATIONS on aarch64 is because that the implicit
promotion is sometimes to 32 bits and sometimes to 64 bits.
WORD_REGISTER_OPERATIONS can't really describe that.

Ideally the way forward is to address that limitation of
WORD_REGISTER_OPERATIONS which will eliminate a large number of zero
extensions.

I also think improving REE would help -- in particular having it handle
subregs which are just another way of expressing an extension.  I
suspect that would also allow folding away a goodly amount of extensions.

And I'm also keen on doing something with type promotion -- Kai did some
work in this space years ago which I found interesting, even if the work
didn't go forward.  It showed a real weakness.  So I'm certainly
interested in looking at Prathamesh's work -- with the caveat that if it
stumbles across the same issues as Kai's work that it likely wouldn't be
acceptable in its current form.

Jeff



Re: [RFC] type promotion pass

2017-09-15 Thread Segher Boessenkool
On Fri, Sep 15, 2017 at 09:18:23AM -0600, Jeff Law wrote:
> WORD_REGISTER_OPERATIONS works with PROMOTE_MODE.  The reason you can't
> define WORD_REGISTER_OPERATIONS on aarch64 is because that the implicit
> promotion is sometimes to 32 bits and sometimes to 64 bits.
> WORD_REGISTER_OPERATIONS can't really describe that.

WORD_REGISTER_OPERATIONS isn't well-defined.

"""
@defmac WORD_REGISTER_OPERATIONS
Define this macro to 1 if operations between registers with integral mode
smaller than a word are always performed on the entire register.
Most RISC machines have this property and most CISC machines do not.
@end defmac
"""

Exactly what operations?  For almost all targets it isn't true for *all*
operations.  Or no targets even, if you include rotate, etc.

For targets that have both 32-bit and 64-bit operations it is never true
either.

> And I'm also keen on doing something with type promotion -- Kai did some
> work in this space years ago which I found interesting, even if the work
> didn't go forward.  It showed a real weakness.  So I'm certainly
> interested in looking at Prathamesh's work -- with the caveat that if it
> stumbles across the same issues as Kai's work that it likely wouldn't be
> acceptable in its current form.

Doing type promotion too aggressively reduces code quality.  "Just" find
a sweet spot :-)

Example: on Power, an AND of QImode with 0xc3 is just one insn, which
actually does a SImode AND with 0xffc3.  This is what we do currently.
A SImode AND with 0x00c3 is two insns, or one if we allow it to write
to CR0 as well ("andi."); same for DImode, except there isn't a way to do
an AND with 0xffc3 in one insn at all.

unsigned char a;
void f(void) { a &= 0xc3; };


Segher


Re: [RFC] type promotion pass

2017-09-15 Thread Jeff Law
On 09/15/2017 10:19 AM, Segher Boessenkool wrote:
> On Fri, Sep 15, 2017 at 09:18:23AM -0600, Jeff Law wrote:
>> WORD_REGISTER_OPERATIONS works with PROMOTE_MODE.  The reason you can't
>> define WORD_REGISTER_OPERATIONS on aarch64 is because that the implicit
>> promotion is sometimes to 32 bits and sometimes to 64 bits.
>> WORD_REGISTER_OPERATIONS can't really describe that.
> 
> WORD_REGISTER_OPERATIONS isn't well-defined.
> 
> """
> @defmac WORD_REGISTER_OPERATIONS
> Define this macro to 1 if operations between registers with integral mode
> smaller than a word are always performed on the entire register.
> Most RISC machines have this property and most CISC machines do not.
> @end defmac
> """
> 
> Exactly what operations?  For almost all targets it isn't true for *all*
> operations.  Or no targets even, if you include rotate, etc.
> 
> For targets that have both 32-bit and 64-bit operations it is never true
> either.
> 
>> And I'm also keen on doing something with type promotion -- Kai did some
>> work in this space years ago which I found interesting, even if the work
>> didn't go forward.  It showed a real weakness.  So I'm certainly
>> interested in looking at Prathamesh's work -- with the caveat that if it
>> stumbles across the same issues as Kai's work that it likely wouldn't be
>> acceptable in its current form.
> 
> Doing type promotion too aggressively reduces code quality.  "Just" find
> a sweet spot :-)
> 
> Example: on Power, an AND of QImode with 0xc3 is just one insn, which
> actually does a SImode AND with 0xffc3.  This is what we do currently.
> A SImode AND with 0x00c3 is two insns, or one if we allow it to write
> to CR0 as well ("andi."); same for DImode, except there isn't a way to do
> an AND with 0xffc3 in one insn at all.
> 
> unsigned char a;
> void f(void) { a &= 0xc3; };
Yes, these are some of the things we kicked around.  One of the most
interesting conclusions was that for these target issues we'd really
like a target.pd file to handle this class of transformations just prior
to rtl expansion.

Essentially early type promotion/demotion would be concerned with cases
where we can eliminate operations in a target independent manner and
narrow operands as much as possible.  Late promotion/demotion would deal
with stuff like the target's desire to work on specific sized hunks in
specific contexts.

I'm greatly oversimplifying here.  Type promotion/demotion is fairly
complex to get right.

jeff


Re: [RFC] type promotion pass

2017-09-15 Thread Segher Boessenkool
On Fri, Sep 15, 2017 at 10:56:04AM -0600, Jeff Law wrote:
> Yes, these are some of the things we kicked around.  One of the most
> interesting conclusions was that for these target issues we'd really
> like a target.pd file to handle this class of transformations just prior
> to rtl expansion.

But often combine will need to know about which bits you actually care
about (and other passes too, but combine is the biggie).

> Essentially early type promotion/demotion would be concerned with cases
> where we can eliminate operations in a target independent manner and
> narrow operands as much as possible.  Late promotion/demotion would deal
> with stuff like the target's desire to work on specific sized hunks in
> specific contexts.
> 
> I'm greatly oversimplifying here.  Type promotion/demotion is fairly
> complex to get right.

Yeah :-(

Maybe the best thing is to promote really early, but to keep track of which
bits matter.  And then adjust some passes to take that into account.  Not a
trivial amount of work.


Segher


Pierre-Marie de Rodat appointed Ada co-maintainer

2017-09-15 Thread David Edelsohn
I am pleased to announce that the GCC Steering Committee has
appointed Pierre-Marie de Rodat as Ada co-maintainer.

Please join me in congratulating Pierre-Marie on his new role.
P-M, please update your listing in the MAINTAINERS file.

Happy hacking!
David



Re: [RFC] type promotion pass

2017-09-15 Thread Jakub Jelinek
On Fri, Sep 15, 2017 at 12:13:39PM -0500, Segher Boessenkool wrote:
> On Fri, Sep 15, 2017 at 10:56:04AM -0600, Jeff Law wrote:
> > Yes, these are some of the things we kicked around.  One of the most
> > interesting conclusions was that for these target issues we'd really
> > like a target.pd file to handle this class of transformations just prior
> > to rtl expansion.
> 
> But often combine will need to know about which bits you actually care
> about (and other passes too, but combine is the biggie).
> 
> > Essentially early type promotion/demotion would be concerned with cases
> > where we can eliminate operations in a target independent manner and
> > narrow operands as much as possible.  Late promotion/demotion would deal
> > with stuff like the target's desire to work on specific sized hunks in
> > specific contexts.
> > 
> > I'm greatly oversimplifying here.  Type promotion/demotion is fairly
> > complex to get right.
> 
> Yeah :-(
> 
> Maybe the best thing is to promote really early, but to keep track of which
> bits matter.  And then adjust some passes to take that into account.  Not a
> trivial amount of work.

Is type promotion actually what we want to do early?  I'd think type
demotion is what better canonicalizes the IL and removes redundant
operations (e.g. those affecting only high bits if we only care about low
bits).
Then another spot is the vectorizer, where we ideally should be promoting or
demoting types such that we have as few different type bitsizes in the loop.
And then somewhere late we should in a target driven way decide what is the
optimal type to perform operations (promote or leave unpromoted).

Jakub


Re: [RFC] type promotion pass

2017-09-15 Thread Segher Boessenkool
On Fri, Sep 15, 2017 at 08:40:41PM +0200, Jakub Jelinek wrote:
> > > I'm greatly oversimplifying here.  Type promotion/demotion is fairly
> > > complex to get right.
> > 
> > Yeah :-(
> > 
> > Maybe the best thing is to promote really early, but to keep track of which
> > bits matter.  And then adjust some passes to take that into account.  Not a
> > trivial amount of work.
> 
> Is type promotion actually what we want to do early?  I'd think type
> demotion is what better canonicalizes the IL and removes redundant
> operations (e.g. those affecting only high bits if we only care about low
> bits).

On gimple we already have smallest type possible, I think?  When expanding
to RTL that then needs to only use instructions that exist for the target.
And then problems happen -- we only have instructions that work on full
registers on many targets (or also on 32-bit items), but we do not care
about the higher bits in some cases, or we only need it to be sign/zero
extended and we do not need a separate extend insn in many cases (but the
RTL passes do not realise that).

Or do you see problems during gimple as well?


Segher


Re: [RFC] type promotion pass

2017-09-15 Thread Jakub Jelinek
On Fri, Sep 15, 2017 at 02:06:22PM -0500, Segher Boessenkool wrote:
> On Fri, Sep 15, 2017 at 08:40:41PM +0200, Jakub Jelinek wrote:
> > > > I'm greatly oversimplifying here.  Type promotion/demotion is fairly
> > > > complex to get right.
> > > 
> > > Yeah :-(
> > > 
> > > Maybe the best thing is to promote really early, but to keep track of 
> > > which
> > > bits matter.  And then adjust some passes to take that into account.  Not 
> > > a
> > > trivial amount of work.
> > 
> > Is type promotion actually what we want to do early?  I'd think type
> > demotion is what better canonicalizes the IL and removes redundant
> > operations (e.g. those affecting only high bits if we only care about low
> > bits).
> 
> On gimple we already have smallest type possible, I think?  When expanding

It is worse than you think then.  We don't.  Just try:
int foo (__int128 x, __int128 y)
{
  __int128 z, a, b;
  z = x + y;
  a = z * 4;
  b = a - x;
  return b;
}

int bar (long long x, int y)
{
  x *= y;
  x += 0xffeb00LL;
  return x;
}

We keep the user selected types up to expansion, e.g. the x += 
0xffeb00LL;
is completely useless etc.  If we demoted, we could
perform all the computation on unsigned int.  The only place we have
demotion is the get_unwidened etc. stuff that is done in convert.c, but that
is done only in the FEs and thus only happens within the same statement.
There is nothing that repeats that on GIMPLE.

A disadvantage of demotion (either the one we do in the FEs or the one we
don't do on GIMPLE) is that for signed wider arithmetics if we demote we
need unsigned arithmetics.

Jakub


Re: [RFC] type promotion pass

2017-09-15 Thread Richard Biener
On September 15, 2017 6:56:04 PM GMT+02:00, Jeff Law  wrote:
>On 09/15/2017 10:19 AM, Segher Boessenkool wrote:
>> On Fri, Sep 15, 2017 at 09:18:23AM -0600, Jeff Law wrote:
>>> WORD_REGISTER_OPERATIONS works with PROMOTE_MODE.  The reason you
>can't
>>> define WORD_REGISTER_OPERATIONS on aarch64 is because that the
>implicit
>>> promotion is sometimes to 32 bits and sometimes to 64 bits.
>>> WORD_REGISTER_OPERATIONS can't really describe that.
>> 
>> WORD_REGISTER_OPERATIONS isn't well-defined.
>> 
>> """
>> @defmac WORD_REGISTER_OPERATIONS
>> Define this macro to 1 if operations between registers with integral
>mode
>> smaller than a word are always performed on the entire register.
>> Most RISC machines have this property and most CISC machines do not.
>> @end defmac
>> """
>> 
>> Exactly what operations?  For almost all targets it isn't true for
>*all*
>> operations.  Or no targets even, if you include rotate, etc.
>> 
>> For targets that have both 32-bit and 64-bit operations it is never
>true
>> either.
>> 
>>> And I'm also keen on doing something with type promotion -- Kai did
>some
>>> work in this space years ago which I found interesting, even if the
>work
>>> didn't go forward.  It showed a real weakness.  So I'm certainly
>>> interested in looking at Prathamesh's work -- with the caveat that
>if it
>>> stumbles across the same issues as Kai's work that it likely
>wouldn't be
>>> acceptable in its current form.
>> 
>> Doing type promotion too aggressively reduces code quality.  "Just"
>find
>> a sweet spot :-)
>> 
>> Example: on Power, an AND of QImode with 0xc3 is just one insn, which
>> actually does a SImode AND with 0xffc3.  This is what we do
>currently.
>> A SImode AND with 0x00c3 is two insns, or one if we allow it to
>write
>> to CR0 as well ("andi."); same for DImode, except there isn't a way
>to do
>> an AND with 0xffc3 in one insn at all.
>> 
>> unsigned char a;
>> void f(void) { a &= 0xc3; };
>Yes, these are some of the things we kicked around.  One of the most
>interesting conclusions was that for these target issues we'd really
>like a target.pd file to handle this class of transformations just
>prior
>to rtl expansion.
>
>Essentially early type promotion/demotion would be concerned with cases
>where we can eliminate operations in a target independent manner and
>narrow operands as much as possible.  Late promotion/demotion would
>deal
>with stuff like the target's desire to work on specific sized hunks in
>specific contexts.
>
>I'm greatly oversimplifying here.  Type promotion/demotion is fairly
>complex to get right.

I always thought we should start with those promotions that are done by RTL 
expansion according to PROMOTE_MODE and friends. The complication is that those 
promotions also apply to function calls and arguments and those are difficult 
to break apart from other ABI specific details. 

IIRC the last time we went over this patch I concluded a better first step 
would be to expose call ABI details on GIMPLE much earlier. But I may 
misremember here. 

Basically we couldn't really apply all promotions RTL expansion applies. One of 
my ideas with doing them early also was to simplify RTL expansion and 
especially promotion issues during SSA coalescing. 

Richard. 

>jeff



Re: [RFC] type promotion pass

2017-09-15 Thread Joseph Myers
On Fri, 15 Sep 2017, Richard Biener wrote:

> IIRC the last time we went over this patch I concluded a better first 
> step would be to expose call ABI details on GIMPLE much earlier. But I 
> may misremember here.

Some call details are exposed in the front ends (see 
targetm.calls.promote_prototypes called from c-typeck.c, for example).  I 
think that's too early; the front end should be generating IR for calls 
that reflects the language semantics, and architecture details should be 
exposed later (quite likely on GIMPLE).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [RFC] type promotion pass

2017-09-15 Thread Michael Clark

> On 16 Sep 2017, at 1:04 AM, David Edelsohn  wrote:
> 
> On Tue, Sep 5, 2017 at 5:26 AM, Prathamesh Kulkarni
>  wrote:
>> Hi,
>> I have attached revamped version of Kugan's original patch for type promotion
>> (https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00472.html)
>> rebased on r249469. The motivation of the pass is to minimize
>> generation of subregs
>> to avoid redundant zero/sign extensions by carrying out computations
>> in PROMOTE_MODE
>> as much as possible on tree-level.
>> 
>> * Working of pass
>> The pass is dominator-based, and tries to promote type of def to
>> PROMOTE_MODE in each gimple stmt. Before beginning domwalk, all the
>> default definitions are promoted to PROMOTE_MODE
>> in promote_all_ssa_defined_with_nop (). The patch adds a new tree code
>> SEXT_EXPR to represent sign-extension on tree level. CONVERT_EXPR is
>> either replaced by an explicit sign or zero extension depending on the
>> signedness of operands.
>> 
>> The core of the pass is in following two routines:
>> a) promote_ssa: This pass looks at the def of gimple_stmt, and
>> promotes type of def to promoted_type in-place if possible. If it
>> cannot promote def in-place, then
>> it transforms:
>> def = def_stmt
>> to
>> new_def = def_stmt
>> def = convert_expr new_def
>> where new_def is a clone of def, and type of def is set to promoted_type.
>> 
>> b) fixup_use: The main intent is to "fix" uses of a promoted variable
>> to preserve semantics
>> of the code, for instance if the variable is used in stmt where it's
>> original type is required.
>> Another case is when def is not promoted by promote_ssa, but some uses
>> could be promoted.
>> 
>> promote_all_stmts () is the driver function that calls fixup_use and
>> promote_ssa for each stmt
>> within the basic block. The pass relies extensively on dom and vrp to
>> remove redundancies generated by the pass and is thus enabled only if
>> vrp is enabled.
>> 
>> Issues:
>> 1] Pass ordering: type-promote pass generates too many redundancies
>> which can hamper other optimizations. One case I observed was on arm
>> when it inhibited path splitting optimization because the number of
>> stmts in basic block exceeded the value of
>> param-max-jump-thread-duplication-stmts. So I placed the pass just
>> before dom. I am not sure if this is a good approach. Maybe the pass
>> itself
>> should avoid generating redundant statements and not rely too much on
>> dom and vrp ?
>> 
>> 2] Redundant copies left after the pass: When it's safe, vrp optimzies
>> _1 = op1 sext op2
>> into
>> _1 = op1
>> 
>> which leaves redundant copies that are not optimized away at GIMPLE level.
>> I placed pass_copy_prop after vrp to eliminate these copies but not sure if 
>> that
>> is ideal. Maybe we should do this during vrp itself as this is the
>> only case that
>> generates redundant copies ?
>> 
>> 3] Backward phi's: Since we traverse in dominated order, fixup_use()
>> gets called first on a ssa_name that is an argument of a backward-phi.
>> If it promotes the type and later if promote_ssa() decides that the
>> ssa_name should not be promoted, then we have to again walk the
>> backward phi's to undo the "incorrect" promotion, which is done by
>> emitting a convert_expr back to the original type from promoted_type.
>> While I suppose it shouldn't affect correctness, it generates
>> redundant casts and was wondering if there's a better approach to
>> handle this issue.
>> 
>> * SPEC2k6 benchmarking:
>> 
>> Results of  benchmarking the patch for aarch64-linux-gnu cortex-a57:
>> 
>> Improved:
>> 401.bzip2  +2.11%
>> 459.GemsFDTD  +1.42%
>> 464.h264ref   +1.96%
>> 471.omnetpp  +1.05%
>> 481.wrf+0.99%
>> 
>> Regressed:
>> 447.dealII-1.34%
>> 450.soplex  -1.54%
>> 456.hmmer  -3.79%
>> 482.sphinx3 -2.95%
>> 
>> The remaining benchmarks didn't have much difference. However there
>> was some noise
>> in the above run, and I suppose the numbers are not precise.
>> I will give another run with benchmarking. There was no significant
>> difference in code-size with and without patch for SPEC2k6.
>> 
>> * Validation
>> 
>> The patch passes bootstrap+test on x86_64-unknown-linux-gnu,
>> arm-linux-gnueabihf,
>> aarch64-linux-gnu and ppc64le-linux-gnu. On arm-linux-gnueabihf, and
>> aarch64-linux-gnu, there is following fallout:
>> 
>> 1] gcc.dg/attr-alloc_size-11.c missing range info for signed char
>> (test for warnings, line 50)
>> The issue seems to be that we call reset_flow_sensitive_info(def) if
>> def is promoted and that invalidates value range which is probably
>> causing the regression.
>> 
>> 2] gcc.dg/fold-narrowbopcst-1.c scan-tree-dump optimized " = _.* + 156"
>> This requires adjusting the scan-dump.
>> 
>> On ppc64le-linux-gnu, I am observing several regressions in the
>> testsuite. Most of these seem to be because type-promotion is
>> interfering with other optimizations, especially widening_mul and
>> bswap pass. Also observed 

Re: [RFC] type promotion pass

2017-09-15 Thread Segher Boessenkool
Hi!

On Sat, Sep 16, 2017 at 08:47:03AM +1200, Michael Clark wrote:
> RISC-V defines promote_mode on RV64 to promote SImode to signed DImode 
> subregisters.  I did an experiment on RISC-V to not promote SImode to DImode 
> and it improved codegen for many of my regression test cases, but 
> unfortunately it breaks the RISC-V ABI.

It sounds like you should implement TARGET_PROMOTE_FUNCTION_MODE as well?


Segher


Re: [RFC] type promotion pass

2017-09-15 Thread Michael Clark

> On 16 Sep 2017, at 8:59 AM, Segher Boessenkool  
> wrote:
> 
> Hi!
> 
> On Sat, Sep 16, 2017 at 08:47:03AM +1200, Michael Clark wrote:
>> RISC-V defines promote_mode on RV64 to promote SImode to signed DImode 
>> subregisters.  I did an experiment on RISC-V to not promote SImode to DImode 
>> and it improved codegen for many of my regression test cases, but 
>> unfortunately it breaks the RISC-V ABI.
> 
> It sounds like you should implement TARGET_PROMOTE_FUNCTION_MODE as well?

riscv currently has default_promote_function_mode_always_promote.

gcc/config/riscv/riscv.c:#define TARGET_PROMOTE_FUNCTION_MODE 
default_promote_function_mode_always_promote

I see that default_promote_function_mode_always_promote just calls promote_mode

Is TARGET_PROMOTE_FUNCTION_MODE used to perform canonicalisation before calls 
and returns? i.e. would it be possible to have promote_mode as a no-op (leave 
SImode values as SImode in the RTL), but have TARGET_PROMOTE_FUNCTION_MODE 
perform promotions similar to our current PROMOTE_MODE definition i.e. is 
TARGET_PROMOTE_FUNCTION_MODE the hook that is used to canonicalise values 
*before* calls and *before* returns?

I’ll do some reading…

I was also curious about benchmarking the alternate ABI choice that leaves the 
upper bits undefined, and does narrowing in the caller. It would be an ABI 
break so is a no go, but I was curious about the performance of this option. 
Any 32-bit ops causes narrowing on demand. It’s only the logical ops that would 
need to be explicitly narrowed in this alternate ABI. In any case I don’t think 
the RISC-V maintainers would accept an ABI break however i’m curious as to the 
advantages and disadvantages of ABIs that do and don’t define the upper bits 
for narrower modes when passing values to and from functions.

Thanks,
Michael.