Re: Byte swapping support
* Eric Botcazou: >> To support applications that assume big-endian memory layout on little- >> endian systems, I'm considering adding support for reversing the >> storage order to GCC. > > That was also the goal of the scalar_storage_order attribute. By the way, what happened to the C++ bits? I think the c-family patch which went in assumes that the C++ bits are there as well.
Re: Byte swapping support
> By the way, what happened to the C++ bits? I think the c-family patch > which went in assumes that the C++ bits are there as well. I don't really understand what you mean by "assume" here, but the C++ bits were incomplete and never got reviewed; I can resurrect them if there is some interest though. -- Eric Botcazou
Re: Infering that the condition of a for loop is initially true?
On Fri, Sep 15, 2017 at 12:07 AM, Jeff Law wrote: > On 09/14/2017 01:28 PM, Niels Möller wrote: >> This is more of a question than a bug report, so I'm trying to send it >> to the list rather than filing a bugzilla issue. >> >> I think it's quite common to write for- and while-loops where the >> condition is always initially true. A simple example might be >> >> double average (const double *a, size_t n) >> { >> double sum; >> size_t i; >> >> assert (n > 0); >> for (i = 0, sum = 0; i < n; i++) >> sum += a[i]; >> return sum / n; >> } >> >> The programmer could do the microptimization to rewrite it as a >> do-while-loop instead. It would be nice if gcc could infer that the >> condition is initially true, and convert to a do-while loop >> automatically. >> >> Converting to a do-while-loop should produce slightly better code, >> omitting the typical jump to enter the loop at the end where the >> condition is checked. It would also make analysis of where variables are >> written more accurate, which is my main concern at the moment. >> >> My questions are: >> >> 1. Does gcc attempt to do this optimization? > Yes. It happens as a side effect of jump threading and there are also > dedicated passes to rotate the loop. The loop header copying does this. >> >> 2. If it does, how often does it succeed on loops in real programs? > Often. The net benefit is actually small though and sometimes this kind > of loop rotation can impede vectorization. Most loop optimizers do not deal well with loops that may not execute or rather they do not handle number of iterations being computed as "n or zero". The vectorizer is an exception to this as it will simply version the loop with the extra condition(s). > >> >> 3. Can I help the compiler to do that inference? > In general, I'd advise against it. You end up with ugly code which > works with specific versions of the compiler, but which needs regular > tweaking as the internal implementations of various optimizers change > over time. > > >> >> The code I had some trouble with is at >> https://git.lysator.liu.se/nettle/nettle/blob/master/ecc-mod.c. A >> simplified version with only the interesting code path would be >> >> void >> ecc_mod (mp_size_t mn, mp_size_t bn, mp_limb_t *rp) >> { >> mp_limb_t hi; >> mp_size_t sn = mn - bn; >> mp_size_t rn = 2*mn; >> >> assert (bn < mn); >> >> while (rn >= 2 * mn - bn) > In this particular case (ignoring the assert), what you want is better > jump threading exploiting range propagation. But you have to be real > careful here due to the potential overflow. > > I'd have to have a self-contained example to dig into what's really > going on, but my suspicion is either overflow or fairly weak range data > and simplification due to the symbolic ranges. > > Jeff >
Re: Byte swapping support
* Eric Botcazou: >> By the way, what happened to the C++ bits? I think the c-family patch >> which went in assumes that the C++ bits are there as well. > > I don't really understand what you mean by "assume" here, handle_pragma_scalar_storage_order does not check c_dialect_cxx, so it will not issue a warning for C++ even though the pragma is effectively ignored.
Re: Byte swapping support
> handle_pragma_scalar_storage_order does not check c_dialect_cxx, so it > will not issue a warning for C++ even though the pragma is effectively > ignored. Indeed, unlike for the attribute, will fix, thanks. -- Eric Botcazou
Re: [RFC] type promotion pass
Hi Prathamesh, I've tried out the latest version and it works really well. It built and ran SPEC2017 without any issues or regressions (I didn't do a detailed comparison which would mean multiple runs, however a single run showed performance is pretty much the same on INT and 0.1% faster on FP). Codesize reduces in almost all cases (only xalancbmk increases by 600 bytes), sometimes by a huge amount. For example in gcc_r around 20% of all AND immediate instructions are removed, clear proof it removes many redundant zero/sign extensions. So consider this a big +1 from me! GCC is behind other compilers with respect to this kind of optimization and it looks like this phase does a major catchup. Like I mentioned, it doesn't have to be 100% perfect, once it has been committed, we can fine tune it and add more optimizations. Wilco
Re: [RFC] type promotion pass
On Tue, Sep 5, 2017 at 5:26 AM, Prathamesh Kulkarni wrote: > Hi, > I have attached revamped version of Kugan's original patch for type promotion > (https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00472.html) > rebased on r249469. The motivation of the pass is to minimize > generation of subregs > to avoid redundant zero/sign extensions by carrying out computations > in PROMOTE_MODE > as much as possible on tree-level. > > * Working of pass > The pass is dominator-based, and tries to promote type of def to > PROMOTE_MODE in each gimple stmt. Before beginning domwalk, all the > default definitions are promoted to PROMOTE_MODE > in promote_all_ssa_defined_with_nop (). The patch adds a new tree code > SEXT_EXPR to represent sign-extension on tree level. CONVERT_EXPR is > either replaced by an explicit sign or zero extension depending on the > signedness of operands. > > The core of the pass is in following two routines: > a) promote_ssa: This pass looks at the def of gimple_stmt, and > promotes type of def to promoted_type in-place if possible. If it > cannot promote def in-place, then > it transforms: > def = def_stmt > to > new_def = def_stmt > def = convert_expr new_def > where new_def is a clone of def, and type of def is set to promoted_type. > > b) fixup_use: The main intent is to "fix" uses of a promoted variable > to preserve semantics > of the code, for instance if the variable is used in stmt where it's > original type is required. > Another case is when def is not promoted by promote_ssa, but some uses > could be promoted. > > promote_all_stmts () is the driver function that calls fixup_use and > promote_ssa for each stmt > within the basic block. The pass relies extensively on dom and vrp to > remove redundancies generated by the pass and is thus enabled only if > vrp is enabled. > > Issues: > 1] Pass ordering: type-promote pass generates too many redundancies > which can hamper other optimizations. One case I observed was on arm > when it inhibited path splitting optimization because the number of > stmts in basic block exceeded the value of > param-max-jump-thread-duplication-stmts. So I placed the pass just > before dom. I am not sure if this is a good approach. Maybe the pass > itself > should avoid generating redundant statements and not rely too much on > dom and vrp ? > > 2] Redundant copies left after the pass: When it's safe, vrp optimzies > _1 = op1 sext op2 > into > _1 = op1 > > which leaves redundant copies that are not optimized away at GIMPLE level. > I placed pass_copy_prop after vrp to eliminate these copies but not sure if > that > is ideal. Maybe we should do this during vrp itself as this is the > only case that > generates redundant copies ? > > 3] Backward phi's: Since we traverse in dominated order, fixup_use() > gets called first on a ssa_name that is an argument of a backward-phi. > If it promotes the type and later if promote_ssa() decides that the > ssa_name should not be promoted, then we have to again walk the > backward phi's to undo the "incorrect" promotion, which is done by > emitting a convert_expr back to the original type from promoted_type. > While I suppose it shouldn't affect correctness, it generates > redundant casts and was wondering if there's a better approach to > handle this issue. > > * SPEC2k6 benchmarking: > > Results of benchmarking the patch for aarch64-linux-gnu cortex-a57: > > Improved: > 401.bzip2 +2.11% > 459.GemsFDTD +1.42% > 464.h264ref +1.96% > 471.omnetpp +1.05% > 481.wrf+0.99% > > Regressed: > 447.dealII-1.34% > 450.soplex -1.54% > 456.hmmer -3.79% > 482.sphinx3 -2.95% > > The remaining benchmarks didn't have much difference. However there > was some noise > in the above run, and I suppose the numbers are not precise. > I will give another run with benchmarking. There was no significant > difference in code-size with and without patch for SPEC2k6. > > * Validation > > The patch passes bootstrap+test on x86_64-unknown-linux-gnu, > arm-linux-gnueabihf, > aarch64-linux-gnu and ppc64le-linux-gnu. On arm-linux-gnueabihf, and > aarch64-linux-gnu, there is following fallout: > > 1] gcc.dg/attr-alloc_size-11.c missing range info for signed char > (test for warnings, line 50) > The issue seems to be that we call reset_flow_sensitive_info(def) if > def is promoted and that invalidates value range which is probably > causing the regression. > > 2] gcc.dg/fold-narrowbopcst-1.c scan-tree-dump optimized " = _.* + 156" > This requires adjusting the scan-dump. > > On ppc64le-linux-gnu, I am observing several regressions in the > testsuite. Most of these seem to be because type-promotion is > interfering with other optimizations, especially widening_mul and > bswap pass. Also observed fallout for some cases due to interference > with strlen, tail-call and jump-threading passes. I suppose I am not > seeing these on arm/aarch64 because ppc64 defines > PROMOTE_MODE to be DImode and
Re: [RFC] type promotion pass
David Edelsohn wrote: > Why does AArch64 define PROMOTE_MODE as SImode? GCC ports for other > RISC targets mostly seem to use a 64-bit mode. Maybe SImode is the > correct definition based on the current GCC optimization > infrastructure, but this seems like a change that should be applied to > all 64 bit RISC targets. The reason is that AArch64 supports both 32-bit registers, so when using char/short you want 32-bit operations. There is an issue in that WORD_REGISTER_OPERATIONS isn't set on AArch64, but it should be. Maybe that requires some cleanups and ensure it correctly interacts with PROMOTE_MODE. There are way too many confusing target defines like this and no general mechanism that just works like you'd expect. Promoting to an orthogonal set of registers is not something particularly unusual, so it's something GCC should support well by default... Wilco
Re: [RFC] type promotion pass
On 09/15/2017 07:47 AM, Wilco Dijkstra wrote: > David Edelsohn wrote: > >> Why does AArch64 define PROMOTE_MODE as SImode? GCC ports for other >> RISC targets mostly seem to use a 64-bit mode. Maybe SImode is the >> correct definition based on the current GCC optimization >> infrastructure, but this seems like a change that should be applied to >> all 64 bit RISC targets. > > The reason is that AArch64 supports both 32-bit registers, so when using > char/short > you want 32-bit operations. There is an issue in that WORD_REGISTER_OPERATIONS > isn't set on AArch64, but it should be. Maybe that requires some cleanups and > ensure it > correctly interacts with PROMOTE_MODE. There are way too many confusing target > defines like this and no general mechanism that just works like you'd expect. > Promoting > to an orthogonal set of registers is not something particularly unusual, so > it's something > GCC should support well by default... Note this ties in directly with the conversation Steve and I have been having in another thread. WORD_REGISTER_OPERATIONS works with PROMOTE_MODE. The reason you can't define WORD_REGISTER_OPERATIONS on aarch64 is because that the implicit promotion is sometimes to 32 bits and sometimes to 64 bits. WORD_REGISTER_OPERATIONS can't really describe that. Ideally the way forward is to address that limitation of WORD_REGISTER_OPERATIONS which will eliminate a large number of zero extensions. I also think improving REE would help -- in particular having it handle subregs which are just another way of expressing an extension. I suspect that would also allow folding away a goodly amount of extensions. And I'm also keen on doing something with type promotion -- Kai did some work in this space years ago which I found interesting, even if the work didn't go forward. It showed a real weakness. So I'm certainly interested in looking at Prathamesh's work -- with the caveat that if it stumbles across the same issues as Kai's work that it likely wouldn't be acceptable in its current form. Jeff
Re: [RFC] type promotion pass
On Fri, Sep 15, 2017 at 09:18:23AM -0600, Jeff Law wrote: > WORD_REGISTER_OPERATIONS works with PROMOTE_MODE. The reason you can't > define WORD_REGISTER_OPERATIONS on aarch64 is because that the implicit > promotion is sometimes to 32 bits and sometimes to 64 bits. > WORD_REGISTER_OPERATIONS can't really describe that. WORD_REGISTER_OPERATIONS isn't well-defined. """ @defmac WORD_REGISTER_OPERATIONS Define this macro to 1 if operations between registers with integral mode smaller than a word are always performed on the entire register. Most RISC machines have this property and most CISC machines do not. @end defmac """ Exactly what operations? For almost all targets it isn't true for *all* operations. Or no targets even, if you include rotate, etc. For targets that have both 32-bit and 64-bit operations it is never true either. > And I'm also keen on doing something with type promotion -- Kai did some > work in this space years ago which I found interesting, even if the work > didn't go forward. It showed a real weakness. So I'm certainly > interested in looking at Prathamesh's work -- with the caveat that if it > stumbles across the same issues as Kai's work that it likely wouldn't be > acceptable in its current form. Doing type promotion too aggressively reduces code quality. "Just" find a sweet spot :-) Example: on Power, an AND of QImode with 0xc3 is just one insn, which actually does a SImode AND with 0xffc3. This is what we do currently. A SImode AND with 0x00c3 is two insns, or one if we allow it to write to CR0 as well ("andi."); same for DImode, except there isn't a way to do an AND with 0xffc3 in one insn at all. unsigned char a; void f(void) { a &= 0xc3; }; Segher
Re: [RFC] type promotion pass
On 09/15/2017 10:19 AM, Segher Boessenkool wrote: > On Fri, Sep 15, 2017 at 09:18:23AM -0600, Jeff Law wrote: >> WORD_REGISTER_OPERATIONS works with PROMOTE_MODE. The reason you can't >> define WORD_REGISTER_OPERATIONS on aarch64 is because that the implicit >> promotion is sometimes to 32 bits and sometimes to 64 bits. >> WORD_REGISTER_OPERATIONS can't really describe that. > > WORD_REGISTER_OPERATIONS isn't well-defined. > > """ > @defmac WORD_REGISTER_OPERATIONS > Define this macro to 1 if operations between registers with integral mode > smaller than a word are always performed on the entire register. > Most RISC machines have this property and most CISC machines do not. > @end defmac > """ > > Exactly what operations? For almost all targets it isn't true for *all* > operations. Or no targets even, if you include rotate, etc. > > For targets that have both 32-bit and 64-bit operations it is never true > either. > >> And I'm also keen on doing something with type promotion -- Kai did some >> work in this space years ago which I found interesting, even if the work >> didn't go forward. It showed a real weakness. So I'm certainly >> interested in looking at Prathamesh's work -- with the caveat that if it >> stumbles across the same issues as Kai's work that it likely wouldn't be >> acceptable in its current form. > > Doing type promotion too aggressively reduces code quality. "Just" find > a sweet spot :-) > > Example: on Power, an AND of QImode with 0xc3 is just one insn, which > actually does a SImode AND with 0xffc3. This is what we do currently. > A SImode AND with 0x00c3 is two insns, or one if we allow it to write > to CR0 as well ("andi."); same for DImode, except there isn't a way to do > an AND with 0xffc3 in one insn at all. > > unsigned char a; > void f(void) { a &= 0xc3; }; Yes, these are some of the things we kicked around. One of the most interesting conclusions was that for these target issues we'd really like a target.pd file to handle this class of transformations just prior to rtl expansion. Essentially early type promotion/demotion would be concerned with cases where we can eliminate operations in a target independent manner and narrow operands as much as possible. Late promotion/demotion would deal with stuff like the target's desire to work on specific sized hunks in specific contexts. I'm greatly oversimplifying here. Type promotion/demotion is fairly complex to get right. jeff
Re: [RFC] type promotion pass
On Fri, Sep 15, 2017 at 10:56:04AM -0600, Jeff Law wrote: > Yes, these are some of the things we kicked around. One of the most > interesting conclusions was that for these target issues we'd really > like a target.pd file to handle this class of transformations just prior > to rtl expansion. But often combine will need to know about which bits you actually care about (and other passes too, but combine is the biggie). > Essentially early type promotion/demotion would be concerned with cases > where we can eliminate operations in a target independent manner and > narrow operands as much as possible. Late promotion/demotion would deal > with stuff like the target's desire to work on specific sized hunks in > specific contexts. > > I'm greatly oversimplifying here. Type promotion/demotion is fairly > complex to get right. Yeah :-( Maybe the best thing is to promote really early, but to keep track of which bits matter. And then adjust some passes to take that into account. Not a trivial amount of work. Segher
Pierre-Marie de Rodat appointed Ada co-maintainer
I am pleased to announce that the GCC Steering Committee has appointed Pierre-Marie de Rodat as Ada co-maintainer. Please join me in congratulating Pierre-Marie on his new role. P-M, please update your listing in the MAINTAINERS file. Happy hacking! David
Re: [RFC] type promotion pass
On Fri, Sep 15, 2017 at 12:13:39PM -0500, Segher Boessenkool wrote: > On Fri, Sep 15, 2017 at 10:56:04AM -0600, Jeff Law wrote: > > Yes, these are some of the things we kicked around. One of the most > > interesting conclusions was that for these target issues we'd really > > like a target.pd file to handle this class of transformations just prior > > to rtl expansion. > > But often combine will need to know about which bits you actually care > about (and other passes too, but combine is the biggie). > > > Essentially early type promotion/demotion would be concerned with cases > > where we can eliminate operations in a target independent manner and > > narrow operands as much as possible. Late promotion/demotion would deal > > with stuff like the target's desire to work on specific sized hunks in > > specific contexts. > > > > I'm greatly oversimplifying here. Type promotion/demotion is fairly > > complex to get right. > > Yeah :-( > > Maybe the best thing is to promote really early, but to keep track of which > bits matter. And then adjust some passes to take that into account. Not a > trivial amount of work. Is type promotion actually what we want to do early? I'd think type demotion is what better canonicalizes the IL and removes redundant operations (e.g. those affecting only high bits if we only care about low bits). Then another spot is the vectorizer, where we ideally should be promoting or demoting types such that we have as few different type bitsizes in the loop. And then somewhere late we should in a target driven way decide what is the optimal type to perform operations (promote or leave unpromoted). Jakub
Re: [RFC] type promotion pass
On Fri, Sep 15, 2017 at 08:40:41PM +0200, Jakub Jelinek wrote: > > > I'm greatly oversimplifying here. Type promotion/demotion is fairly > > > complex to get right. > > > > Yeah :-( > > > > Maybe the best thing is to promote really early, but to keep track of which > > bits matter. And then adjust some passes to take that into account. Not a > > trivial amount of work. > > Is type promotion actually what we want to do early? I'd think type > demotion is what better canonicalizes the IL and removes redundant > operations (e.g. those affecting only high bits if we only care about low > bits). On gimple we already have smallest type possible, I think? When expanding to RTL that then needs to only use instructions that exist for the target. And then problems happen -- we only have instructions that work on full registers on many targets (or also on 32-bit items), but we do not care about the higher bits in some cases, or we only need it to be sign/zero extended and we do not need a separate extend insn in many cases (but the RTL passes do not realise that). Or do you see problems during gimple as well? Segher
Re: [RFC] type promotion pass
On Fri, Sep 15, 2017 at 02:06:22PM -0500, Segher Boessenkool wrote: > On Fri, Sep 15, 2017 at 08:40:41PM +0200, Jakub Jelinek wrote: > > > > I'm greatly oversimplifying here. Type promotion/demotion is fairly > > > > complex to get right. > > > > > > Yeah :-( > > > > > > Maybe the best thing is to promote really early, but to keep track of > > > which > > > bits matter. And then adjust some passes to take that into account. Not > > > a > > > trivial amount of work. > > > > Is type promotion actually what we want to do early? I'd think type > > demotion is what better canonicalizes the IL and removes redundant > > operations (e.g. those affecting only high bits if we only care about low > > bits). > > On gimple we already have smallest type possible, I think? When expanding It is worse than you think then. We don't. Just try: int foo (__int128 x, __int128 y) { __int128 z, a, b; z = x + y; a = z * 4; b = a - x; return b; } int bar (long long x, int y) { x *= y; x += 0xffeb00LL; return x; } We keep the user selected types up to expansion, e.g. the x += 0xffeb00LL; is completely useless etc. If we demoted, we could perform all the computation on unsigned int. The only place we have demotion is the get_unwidened etc. stuff that is done in convert.c, but that is done only in the FEs and thus only happens within the same statement. There is nothing that repeats that on GIMPLE. A disadvantage of demotion (either the one we do in the FEs or the one we don't do on GIMPLE) is that for signed wider arithmetics if we demote we need unsigned arithmetics. Jakub
Re: [RFC] type promotion pass
On September 15, 2017 6:56:04 PM GMT+02:00, Jeff Law wrote: >On 09/15/2017 10:19 AM, Segher Boessenkool wrote: >> On Fri, Sep 15, 2017 at 09:18:23AM -0600, Jeff Law wrote: >>> WORD_REGISTER_OPERATIONS works with PROMOTE_MODE. The reason you >can't >>> define WORD_REGISTER_OPERATIONS on aarch64 is because that the >implicit >>> promotion is sometimes to 32 bits and sometimes to 64 bits. >>> WORD_REGISTER_OPERATIONS can't really describe that. >> >> WORD_REGISTER_OPERATIONS isn't well-defined. >> >> """ >> @defmac WORD_REGISTER_OPERATIONS >> Define this macro to 1 if operations between registers with integral >mode >> smaller than a word are always performed on the entire register. >> Most RISC machines have this property and most CISC machines do not. >> @end defmac >> """ >> >> Exactly what operations? For almost all targets it isn't true for >*all* >> operations. Or no targets even, if you include rotate, etc. >> >> For targets that have both 32-bit and 64-bit operations it is never >true >> either. >> >>> And I'm also keen on doing something with type promotion -- Kai did >some >>> work in this space years ago which I found interesting, even if the >work >>> didn't go forward. It showed a real weakness. So I'm certainly >>> interested in looking at Prathamesh's work -- with the caveat that >if it >>> stumbles across the same issues as Kai's work that it likely >wouldn't be >>> acceptable in its current form. >> >> Doing type promotion too aggressively reduces code quality. "Just" >find >> a sweet spot :-) >> >> Example: on Power, an AND of QImode with 0xc3 is just one insn, which >> actually does a SImode AND with 0xffc3. This is what we do >currently. >> A SImode AND with 0x00c3 is two insns, or one if we allow it to >write >> to CR0 as well ("andi."); same for DImode, except there isn't a way >to do >> an AND with 0xffc3 in one insn at all. >> >> unsigned char a; >> void f(void) { a &= 0xc3; }; >Yes, these are some of the things we kicked around. One of the most >interesting conclusions was that for these target issues we'd really >like a target.pd file to handle this class of transformations just >prior >to rtl expansion. > >Essentially early type promotion/demotion would be concerned with cases >where we can eliminate operations in a target independent manner and >narrow operands as much as possible. Late promotion/demotion would >deal >with stuff like the target's desire to work on specific sized hunks in >specific contexts. > >I'm greatly oversimplifying here. Type promotion/demotion is fairly >complex to get right. I always thought we should start with those promotions that are done by RTL expansion according to PROMOTE_MODE and friends. The complication is that those promotions also apply to function calls and arguments and those are difficult to break apart from other ABI specific details. IIRC the last time we went over this patch I concluded a better first step would be to expose call ABI details on GIMPLE much earlier. But I may misremember here. Basically we couldn't really apply all promotions RTL expansion applies. One of my ideas with doing them early also was to simplify RTL expansion and especially promotion issues during SSA coalescing. Richard. >jeff
Re: [RFC] type promotion pass
On Fri, 15 Sep 2017, Richard Biener wrote: > IIRC the last time we went over this patch I concluded a better first > step would be to expose call ABI details on GIMPLE much earlier. But I > may misremember here. Some call details are exposed in the front ends (see targetm.calls.promote_prototypes called from c-typeck.c, for example). I think that's too early; the front end should be generating IR for calls that reflects the language semantics, and architecture details should be exposed later (quite likely on GIMPLE). -- Joseph S. Myers jos...@codesourcery.com
Re: [RFC] type promotion pass
> On 16 Sep 2017, at 1:04 AM, David Edelsohn wrote: > > On Tue, Sep 5, 2017 at 5:26 AM, Prathamesh Kulkarni > wrote: >> Hi, >> I have attached revamped version of Kugan's original patch for type promotion >> (https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00472.html) >> rebased on r249469. The motivation of the pass is to minimize >> generation of subregs >> to avoid redundant zero/sign extensions by carrying out computations >> in PROMOTE_MODE >> as much as possible on tree-level. >> >> * Working of pass >> The pass is dominator-based, and tries to promote type of def to >> PROMOTE_MODE in each gimple stmt. Before beginning domwalk, all the >> default definitions are promoted to PROMOTE_MODE >> in promote_all_ssa_defined_with_nop (). The patch adds a new tree code >> SEXT_EXPR to represent sign-extension on tree level. CONVERT_EXPR is >> either replaced by an explicit sign or zero extension depending on the >> signedness of operands. >> >> The core of the pass is in following two routines: >> a) promote_ssa: This pass looks at the def of gimple_stmt, and >> promotes type of def to promoted_type in-place if possible. If it >> cannot promote def in-place, then >> it transforms: >> def = def_stmt >> to >> new_def = def_stmt >> def = convert_expr new_def >> where new_def is a clone of def, and type of def is set to promoted_type. >> >> b) fixup_use: The main intent is to "fix" uses of a promoted variable >> to preserve semantics >> of the code, for instance if the variable is used in stmt where it's >> original type is required. >> Another case is when def is not promoted by promote_ssa, but some uses >> could be promoted. >> >> promote_all_stmts () is the driver function that calls fixup_use and >> promote_ssa for each stmt >> within the basic block. The pass relies extensively on dom and vrp to >> remove redundancies generated by the pass and is thus enabled only if >> vrp is enabled. >> >> Issues: >> 1] Pass ordering: type-promote pass generates too many redundancies >> which can hamper other optimizations. One case I observed was on arm >> when it inhibited path splitting optimization because the number of >> stmts in basic block exceeded the value of >> param-max-jump-thread-duplication-stmts. So I placed the pass just >> before dom. I am not sure if this is a good approach. Maybe the pass >> itself >> should avoid generating redundant statements and not rely too much on >> dom and vrp ? >> >> 2] Redundant copies left after the pass: When it's safe, vrp optimzies >> _1 = op1 sext op2 >> into >> _1 = op1 >> >> which leaves redundant copies that are not optimized away at GIMPLE level. >> I placed pass_copy_prop after vrp to eliminate these copies but not sure if >> that >> is ideal. Maybe we should do this during vrp itself as this is the >> only case that >> generates redundant copies ? >> >> 3] Backward phi's: Since we traverse in dominated order, fixup_use() >> gets called first on a ssa_name that is an argument of a backward-phi. >> If it promotes the type and later if promote_ssa() decides that the >> ssa_name should not be promoted, then we have to again walk the >> backward phi's to undo the "incorrect" promotion, which is done by >> emitting a convert_expr back to the original type from promoted_type. >> While I suppose it shouldn't affect correctness, it generates >> redundant casts and was wondering if there's a better approach to >> handle this issue. >> >> * SPEC2k6 benchmarking: >> >> Results of benchmarking the patch for aarch64-linux-gnu cortex-a57: >> >> Improved: >> 401.bzip2 +2.11% >> 459.GemsFDTD +1.42% >> 464.h264ref +1.96% >> 471.omnetpp +1.05% >> 481.wrf+0.99% >> >> Regressed: >> 447.dealII-1.34% >> 450.soplex -1.54% >> 456.hmmer -3.79% >> 482.sphinx3 -2.95% >> >> The remaining benchmarks didn't have much difference. However there >> was some noise >> in the above run, and I suppose the numbers are not precise. >> I will give another run with benchmarking. There was no significant >> difference in code-size with and without patch for SPEC2k6. >> >> * Validation >> >> The patch passes bootstrap+test on x86_64-unknown-linux-gnu, >> arm-linux-gnueabihf, >> aarch64-linux-gnu and ppc64le-linux-gnu. On arm-linux-gnueabihf, and >> aarch64-linux-gnu, there is following fallout: >> >> 1] gcc.dg/attr-alloc_size-11.c missing range info for signed char >> (test for warnings, line 50) >> The issue seems to be that we call reset_flow_sensitive_info(def) if >> def is promoted and that invalidates value range which is probably >> causing the regression. >> >> 2] gcc.dg/fold-narrowbopcst-1.c scan-tree-dump optimized " = _.* + 156" >> This requires adjusting the scan-dump. >> >> On ppc64le-linux-gnu, I am observing several regressions in the >> testsuite. Most of these seem to be because type-promotion is >> interfering with other optimizations, especially widening_mul and >> bswap pass. Also observed
Re: [RFC] type promotion pass
Hi! On Sat, Sep 16, 2017 at 08:47:03AM +1200, Michael Clark wrote: > RISC-V defines promote_mode on RV64 to promote SImode to signed DImode > subregisters. I did an experiment on RISC-V to not promote SImode to DImode > and it improved codegen for many of my regression test cases, but > unfortunately it breaks the RISC-V ABI. It sounds like you should implement TARGET_PROMOTE_FUNCTION_MODE as well? Segher
Re: [RFC] type promotion pass
> On 16 Sep 2017, at 8:59 AM, Segher Boessenkool > wrote: > > Hi! > > On Sat, Sep 16, 2017 at 08:47:03AM +1200, Michael Clark wrote: >> RISC-V defines promote_mode on RV64 to promote SImode to signed DImode >> subregisters. I did an experiment on RISC-V to not promote SImode to DImode >> and it improved codegen for many of my regression test cases, but >> unfortunately it breaks the RISC-V ABI. > > It sounds like you should implement TARGET_PROMOTE_FUNCTION_MODE as well? riscv currently has default_promote_function_mode_always_promote. gcc/config/riscv/riscv.c:#define TARGET_PROMOTE_FUNCTION_MODE default_promote_function_mode_always_promote I see that default_promote_function_mode_always_promote just calls promote_mode Is TARGET_PROMOTE_FUNCTION_MODE used to perform canonicalisation before calls and returns? i.e. would it be possible to have promote_mode as a no-op (leave SImode values as SImode in the RTL), but have TARGET_PROMOTE_FUNCTION_MODE perform promotions similar to our current PROMOTE_MODE definition i.e. is TARGET_PROMOTE_FUNCTION_MODE the hook that is used to canonicalise values *before* calls and *before* returns? I’ll do some reading… I was also curious about benchmarking the alternate ABI choice that leaves the upper bits undefined, and does narrowing in the caller. It would be an ABI break so is a no go, but I was curious about the performance of this option. Any 32-bit ops causes narrowing on demand. It’s only the logical ops that would need to be explicitly narrowed in this alternate ABI. In any case I don’t think the RISC-V maintainers would accept an ABI break however i’m curious as to the advantages and disadvantages of ABIs that do and don’t define the upper bits for narrower modes when passing values to and from functions. Thanks, Michael.