timisers. Or do we have an example where that does not
work or is inconvenient?
Segher
least, it seems to be this patch, I haven't actually checked).
Some representative backtraces:
/home/segher/src/gcc/gcc/testsuite/gcc.c-torture/compile/pr54713-1.c: In
function 'f1':
/home/segher/src/gcc/gcc/testsuite/gcc.c-torture/compile/pr54713-1.c:13:1:
internal compiler error
nt, well, do you really care? :-)
(That second case might eventually fold to your original expression).
Segher
t; So, my suggestion would be to warn for any call with a nonzero value.
The current documentation says that you should only use nonzero values
for debug purposes. A warning would help yes, how many people read the
manual after all :-)
Segher
I, as defined in config/elfos.h will not emit NUL
> - characters - instead it treats them as sub-string separators. Since
> - we want to emit NUL strings terminators into the object file we have to
> use
> - ASM_OUTPUT_SKIP. */
> + FIXME: target.asm_out.output_ascii, as defined in config/elfos.h will not
targetm?
Segher
(FILE *f, const char *label)
> +{
> + assemble_name (f, label);
> + if (TARGET_GAS)
> +fputs (":\n", f);
> + else
> +fputc ('\n', (f));
You forgot to remove the extra parens here :-)
If so many targets use default_assemble_label, can you make it an actual
default instead of defining it everywhere?
Segher
this for target maintainers to do? It
> seems like an improvement over what we have, and I don't think they are
> getting in anyones way.
If you're not confident removing it yourself, then don't. It isn't
getting in anyone's way, simply because there is so *much* clutter...
Segher
On Mon, Jul 27, 2015 at 09:11:12AM +0100, Kyrill Tkachov wrote:
> On 25/07/15 03:19, Segher Boessenkool wrote:
> >On Fri, Jul 24, 2015 at 09:09:39AM +0100, Kyrill Tkachov wrote:
> >>This transformation folds (X % C) == N into
> >>X & ((1 << (size - 1)) | (C
o argument can have unpredictable
> +effects, including crashing the calling program. As a result, calls
> +that are considered unsafe are diagnosed when the @option{-Wbuiltin-address}
> +option is in effect. Such calls are typically only useful in debugging
> +situations.
I like the original "should only be used" better than that last line.
Elsewhere there was a "non-zero" btw, but we should use "nonzero" according
to the coding conventions. Huh.
> +void* __attribute__ ((weak))
Not all targets support weak.
Segher
16QImode);
> +}
> + else
> +operands[1] = operands[2] = operands[3] = operands[0];
This won't work (in the pattern you write to op 3 before reading from op 2).
Do you ever call this expander late, anyway?
Segher
On Wed, Jul 29, 2015 at 06:38:45PM -0400, Michael Meissner wrote:
> On Wed, Jul 29, 2015 at 04:59:23PM -0500, Segher Boessenkool wrote:
> > On Wed, Jul 29, 2015 at 04:04:28PM -0400, Michael Meissner wrote:
> > > +;; Return constant 0x8000 in an Altiv
er could
> function correctly if this ever changed in the middle of a function.
It is also very ugly and much harder to read: it is longer, with more
useless interpunction, and there is nothing that makes clear it is a
constant.
Segher
Paper bag time. Committing as obvious fix. Bootstrapped and regression
checked on powerpc64-linux and powerpc64le-linux; also bootstrapped the
latter with --enable-checking=release and -O3 (the PR67045 case). Will
do an --enable-checking=yes,rtl as well.
Segher
2015-07-29 Segher
The two files *-match.o files always finish building last, so if we
start building them as soon as possible (instead of pretty late) the
total build time will be less on a parallel build.
Bootstrapped and tested on powerpc64-linux. Is this okay for trunk?
Segher
2014-080-3 Segher
ve a reason why you want the entry stack address instead of the
frame address, but you didn't really explain I think? Or I missed it.
Segher
for
a count of 0; but making it target-specific is certainly more conservative.
You say i386 doesn't have that target macro defined currently. Yes I know;
so change that? Or change the generic code, but that is much more testing.
Segher
gt;> > the
> >> > frame address, but you didn't really explain I think? Or I missed it.
What would a C program do with this, that it cannot do with the frame
address, that would be useful and cannot be much better done in straight
assembler? Do you actually want to expose the argument pointer, maybe?
Segher
;
> Heh, indeed. Maybe instead do
>
> .insert_from_file
>
> and do that only when we are using -pipe or so.
That's ".incbin". Do we really want to go through the headaches of using
extra files though? Is this really a bottleneck? Will it even help?
Segher
With the new shrink-wrap algorithm, blocks reachable both with and
without prologue are duplicated, and their incoming edges are then
distributed accordingly. So we need to call fixup_partitions.
Is this okay for trunk?
Segher
2015-09-16 Segher Boessenkool
PR bootstrap/67587
ders call DONE. You can instead write
those patterns as just USEs of their operands?
Segher
> list):
We also need it to avoid the "-jN check loses most results in the summary"
problem; or it seems we need to avoid 1.5.2 only for that, if I read the
log correctly.
Segher
er expand, so relying on combine to combine
all that plus the following splat (as Richard suggests below) is not
really going to work.
If there also are targets where the _scal version is cheaper, maybe
we should keep both, and have expand expand to whatever the target
supports?
Segher
is the same test as was there before; I haven't measured what the
impact of this suboptimality is.
Bootstrapped and tested on powerpc64-linux. is this okay for mainline?
Segher
2015-09-18 Segher Boessenkool
* function.c (thread_prologue_and_epilogue_insns): Delete
or
ump over the new block (it was not necessarily a
fallthrough edge, could be EDGE_COMPLEX). In most cases we could do it
of course. I would prefer not to add special case code (that is not as
well tested).
Segher
On Tue, Sep 22, 2015 at 05:47:25PM +0200, Bernd Schmidt wrote:
> On 09/21/2015 04:01 PM, Segher Boessenkool wrote:
> >On Mon, Sep 21, 2015 at 01:56:28PM +0200, Bernd Schmidt wrote:
> >>>+ basic_block new_bb = create_empty_bb (EXIT_BLOCK_PTR_FOR_FN
> >>>(cfun)->
On Tue, Sep 22, 2015 at 11:11:47AM -0500, Segher Boessenkool wrote:
> > So, given that your solution seems to work, the patch is ok.
>
> Thanks! Here is what I will commit after bootstrap+test (the superfluous
> assert removed, and the comment for the code quoted above tweaked a
step works.
Bootstrapped and tested on powerpc64-linux. There are two new fails in
guality testresults for -Os.
Is this okay for mainline?
Segher
Segher Boessenkool (4):
bb-reorder: Split out STC
bb-reorder: Add the "simple" algorithm
bb-reorder: Add -freorder-blocks-algorithm= and wir
This first patch simply factors code a little bit.
2015-09-23 Segher Boessenkool
* bb-reorder.c (reorder_basic_blocks_software_trace_cache): New
function, factored out from ...
(reorder_basic_blocks): ... here.
---
gcc/bb-reorder.c | 29
disregard loops (which we cannot allow) and the
complications of block partitioning.
2015-09-23 Segher Boessenkool
* bb-reorder.c (reorder_basic_blocks_software_trace_cache): Print
a header to the dump file.
(edge_order): New function.
(reorder_basic_blocks_simple
This updates the documentation for the new option and new defaults.
2015-09-23 Segher Boessenkool
* doc/invoke.texi (Optimization Options): Add
-freorder-blocks-algorithm=.
(Optimize Options) <-O>: Add -freorder-blocks.
<-O2>: Remove -freorder-
the changes are for -O1 (which now gets "simple" instead of
nothing), -Os (which now gets "simple" instead of "stc", since STC results
in much bigger code), and for targets that wish to never use STC (not in
this patch though).
2015-09-23 Segher Boessenkool
This seems like an obvious typo. I cannot test SPE, but I noticed
this offset shows up in the debug output for normal configurations.
The condition is inverted, compared to all similar ones.
Is this okay for trunk?
Segher
2015-09-23 Segher Boessenkool
* config/rs6000/rs6000.c
On Thu, Sep 24, 2015 at 11:56:22AM +0200, Bernd Schmidt wrote:
> On 09/24/2015 12:06 AM, Segher Boessenkool wrote:
> >The current basic block reordering always uses the "software trace cache"
> >algorithm. That has a few problems:
> >
> >1) It increases code
On Thu, Sep 24, 2015 at 12:32:59PM +0200, Bernd Schmidt wrote:
> On 09/24/2015 12:06 AM, Segher Boessenkool wrote:
> >This is the meat of this series: a new algorithm to do basic block
> >reordering. It uses the simple greedy approach to maximum weighted
> >matching, wher
On Thu, Sep 24, 2015 at 12:28:08PM +0200, Bernd Schmidt wrote:
> On 09/24/2015 12:06 AM, Segher Boessenkool wrote:
> >This adds an -freorder-blocks-algorithm=[simple|stc] flag, with "simple"
> >as default. For -O2 and up (except -Os) it is switched to "stc" inst
On Thu, Sep 24, 2015 at 08:39:30AM -0500, Segher Boessenkool wrote:
> > Any thoughts on this vs qsort? Do you need a stable sort?
>
> We always need stable sorts in GCC; things are not reproducible across
> targets with qsort (not every qsort is the same).
s/targets/hosts/
On Thu, Sep 24, 2015 at 08:12:55AM -0700, Andi Kleen wrote:
> Segher Boessenkool writes:
> >
> > In effect, the changes are for -O1 (which now gets "simple" instead of
> > nothing), -Os (which now gets "simple" instead of "stc", since STC res
On Thu, Sep 24, 2015 at 06:03:33PM +0200, Steven Bosscher wrote:
> On Thu, Sep 24, 2015 at 12:06 AM, Segher Boessenkool wrote:
> > + /* First, collect all edges that can be optimized by reordering blocks:
> > + simple jumps and conditional jumps, as well as the function
v2 changes:
- Add a file header comment;
- Use "for" loop initial declarations;
- Handle asm goto.
Testing this on x86_64-linux; okay if it succeeds?
Segher
2015-09-99 Segher Boessenkool
* bb-reorder.c: Add intro comment.
(reorder_basic_blocks_software_trace_cac
On Fri, Sep 25, 2015 at 10:59:37AM -0500, Peter Bergner wrote:
> On Fri, 2015-09-25 at 09:16 -0500, Segher Boessenkool wrote:
> > (reorder_basic_blocks): Choose between the STC and the simple
> > algorithms (always choose the former).
> [snip]
> @@ -2274,7 +2444,10 @@
track down, but it's usually worth the time spent in the end.
Compile with -dap (directly with cc1 or with -S), find some difference
in the generated asm, see what the RTL insn number for that was (that's
what -dp is for), find where the difference came from (from the dumpfiles,
that's -da; you probably already know what pass is the culprit ;-) )
Segher
gt;
> ratio with the optimization vs ratio without optimization for FP benchmarks
> ( 4668.743 vs 4778.741)
Did you swap these? You're saying FP got significantly worse?
Segher
ong time.
We can at least change the default to LRA, so new ports get it unless
they like to hurt themselves.
I don't think it makes sense to keep reload around *just* for the ports
that are in "maintenance mode": by the time we are down to *just* those
ports, it makes more sense to relabel them as "unmaintained".
Segher
Double-quoted words in Tcl have substitutions performed on them, including
backslash substitutions. That isn't terribly nice for regular expressions,
so use braced words instead.
Tested on powerpc64-linux. Okay for mainline?
Segher
2015-09-28 Segher Boessenkool
gcc/test
ort a significant priority. I'd be
> more likely to push for deprecating cc0 targets first.
It looks like most cc0 targets would be pretty easy to convert, if anyone
can do testing anyway ;-)
Segher
-m64/-mlra); new testcase fails before, works
after (on 32-bit).
Is this okay for mainline?
Segher
2015-09-30 Segher Boessenkool
PR target/67788
PR target/67789
* config/rs6000/rs6000.c (TARGET_CANNOT_COPY_INSN_P): New.
(rs6000_cannot_copy_insn_p): New function
On Thu, Oct 01, 2015 at 12:14:44PM +0200, Richard Biener wrote:
> On Thu, Oct 1, 2015 at 8:08 AM, Segher Boessenkool
> wrote:
> > After the shrink-wrapping patches the prologue will often be pushed
> > "deeper" into the function, which in turn means the software tr
x27;t save the call to the target hook, and
that is a big part of the overhead already.
Segher
On Fri, Oct 02, 2015 at 10:24:07AM +0930, Alan Modra wrote:
> On Thu, Oct 01, 2015 at 12:18:08PM -0500, Segher Boessenkool wrote:
> > On Thu, Oct 01, 2015 at 12:14:44PM +0200, Richard Biener wrote:
> > > So even if not "easy", can you try?
> >
> > I did,
7861.
> +case BUILT_IN_ACC_ON_DEVICE:
> + return gimple_fold_builtin_acc_on_device (gsi,
> + gimple_call_arg (stmt, 0));
> default:;
> }
Segher
4 stc x86_64
1905733 1905733 1905733 1905733 1905733 1905733 - xtensa
Is this okay for trunk?
Segher
2015-10-08 Segher Boessenkool
PR rtl-optimization/67864
* gcc/bb-reorder (reorder_basic_blocks_simple): Prefer existing
fallthrough edges for condit
icantly improve on unconditional
branches. I'm sure there are better heuristics possible but I don't know
them (and actually *solving* the problem isn't even polynomial of course).
Segher
On Fri, Oct 09, 2015 at 12:35:46PM +0200, Bernd Schmidt wrote:
> On 10/08/2015 06:57 PM, Segher Boessenkool wrote:
> >As the PR points out, the "simple" reorder algorithm makes bigger code
> >than the STC algorithm did, for -Os, for x86. I now tested it for many
> >
For x86, STC still gives better results for optimise-for-size than
"simple" does. So use STC at -Os as well.
Is this okay for trunk?
Segher
2015-10-16 Segher Boessenkool
PR rtl-optimization/67864
* common/config/i386/i386-common.c (ix86_option_optimiza
For mn10300, STC still gives better results for optimise-for-size than
"simple" does. So use STC at -Os as well.
Is this okay for trunk?
Segher
2015-10-16 Segher Boessenkool
* common/config/mn10300/mn10300-common.c
(mn10300_option_optimization_table) :
On Fri, Oct 16, 2015 at 02:55:54PM +0200, Bernd Schmidt wrote:
> On 10/16/2015 02:53 PM, Segher Boessenkool wrote:
> >For x86, STC still gives better results for optimise-for-size than
> >"simple" does. So use STC at -Os as well.
>
> For how many targets is this tr
; G "}}"
> +#elif DEFAULT_LIBC == LIBC_MUSL
> +#define CHOOSE_DYNAMIC_LINKER(G, U, M) \
> + "%{mglibc:" G ";:%{muclibc:" U ";:" M "}}"
> #else
> #error "Unsupported DEFAULT_LIBC"
> #endif
This doesn't really scale, I wonder if some more elegant non-quadratic
way is possible? Not that I expect terribly many other libcs to show
up in the near future ;-)
Segher
try the to use the const_double directly
> but rather its constant pool reference.
What happens if the constant pool reference is actually the better
code, do we still generate that?
Segher
c))) x.c:21 472 {*branch_equalitysi}
> (expr_list:REG_DEAD (reg:DI 207)
> (int_list:REG_BR_PROB 8010 (nil)))
> -> 35)
Why does *branch_equalitysi allow these subregs if it cannot handle them?
Not just combine can generate code like this (in theory at least).
Segher
e-optimizations is default at -O2.
> +
> +#include
Does every target have that header? Shouldn't it be ?
> --- a/gcc/testsuite/gcc.dg/torture/pr67736.c
> +++ b/gcc/testsuite/gcc.dg/torture/pr67736.c
> @@ -0,0 +1,32 @@
> +/* { dg-do run } */
> +
> +#include
> +
And here you don't need inttypes at all? Confused.
Segher
e additional patch in this
> selection, to restrict __float128 and IBM extended double from being combined
> in an expression.
#04, #05 are missing the patch.
Segher
For 32-bit targets p8vector_ok does not imply we have int128.
Tested with -m32,-m32/-mpowerpc64,-m64; okay for trunk?
Segher
2015-10-26 Segher Boessenkool
gcc/testsuite/
* gcc.target/powerpc/p8vector-builtin-8.c: Add "target int128".
---
gcc/testsuite/gcc.target/powerp
The patterns involved can create vmadd resp. vnmsub instructions instead.
This patch changes the testcases to allow those.
Tested with -m32,-m32/-mpowerpc64,-m64; okay for trunk?
Segher
2015-10-26 Segher Boessenkool
gcc/testsuite/
* gcc.target/powerpc/vsx-builtin-2.c: Allow vmadd
ot; } { "" } } */
Some of this is redundant. Should this really be linux-only?
The second line doesn't do anything then. { "*" } { "" } isn't
needed either.
In general, please only add dg-skip-if if you know it is needed.
There probably should be an effective-target for float128.
Segher
solve the rs6000
bootstrap problems with LRA enabled by default (it ICEd building libitm),
with no apparent ill effects.
It seems other targets can do without this. rs6000 uses this construct
inside of a PARALLEL, maybe that is the difference?
Any hints appreciated!
Segher
2015-10-29 Segher B
and the previous LRA patch there now are no differences
between -mlra and -mno-lra testsuite runs (on big-endian power7, gcc110).
Is this okay for trunk?
Segher
2015-10-29 Segher Boessenkool
* config/rs6000/rs6000.c (rs6000_reg_live_or_pic_offset_p): Move this
function
variants {-m32/-mno-lra,-m32/-mlra,-m32/-mpowerpc64,-m64/-mno-lra,-m64/-mlra};
and on powerpc64le-linux, everything default. Both also with bootstrapping
with LRA defaulted on.
Okay for trunk?
Segher
2015-10-31 Segher Boessenkool
* config/rs6000/rs6000.c (rs6000_reg_live_or_pic_
This function is quite a puzzle; untangle it. No functional change.
Tested etc.; okay for trunk?
Segher
2015-10-31 Segher Boessenkool
* config/rs6000/rs6000.c (rs6000_reg_live_or_pic_offset_p): Rewrite.
---
gcc/config/rs6000/rs6000.c | 35 ---
1
ETs (one of PC).
Segher
On Mon, Nov 02, 2015 at 02:38:33PM +, Alan Lawrence wrote:
> On 23/09/15 23:06, Segher Boessenkool wrote:
> >This adds an -freorder-blocks-algorithm=[simple|stc] flag, with "simple"
> >as default. For -O2 and up (except -Os) it is switched to "stc" instead.
ar
> +instructions that were added in version 2.07 of the PowerPC ISA. Also
> +enable the use of built-in functions that allow more direct access to
> +the vector instructions.
3.0 here as well?
Segher
> The testcase in the patch is the most minimal one I could get that
> demonstrates the issue I'm trying to solve.
>
> Does this approach look ok?
In broad lines it does. Could you try this patch instead? Not tested
etc. (other than building an aarch64 cross and your test case
meaning of "optimization"; anything else
is arguably a bug. But people have many different understandings
of what a "compiler optimization" is, all the way to "anything the
compiler does".
Segher
On Thu, Nov 05, 2015 at 02:04:47PM -0700, Martin Sebor wrote:
> On 11/05/2015 10:09 AM, Segher Boessenkool wrote:
> >On Thu, Nov 05, 2015 at 08:58:25AM -0700, Martin Sebor wrote:
> >>I don't think that reiterating in a condensed form what the manual
> >>doesn'
ortex-a53, or you just get a divide insn).
> Is there a way that subst can signal some kind of "failed to substitute"
> result?
Yep, see new patch. The "from == to" condition is for when subst is called
just to simplify some code (normally with pc_rtx, pc_rtx).
> If n
g else
but registers (immediates, memory, ...). This probably is a reasonable
tradeoff for all targets, even those (if any) that have such insns.
> >I'll let you put it through it's paces on your setup :)
> I'll let Segher give the final yes/no on this, but it generally lo
Adding x86 maintainer, ping?
On Fri, Oct 16, 2015 at 05:53:41AM -0700, Segher Boessenkool wrote:
> For x86, STC still gives better results for optimise-for-size than
> "simple" does. So use STC at -Os as well.
>
> Is this okay for trunk?
>
>
> Segher
>
&
rator is applied to this
pointer explicitly, or implicitly as a result of subscripting, the
result is the referenced (n - 1)-dimensional array, which itself is
converted into a pointer if used as other than an lvalue. It follows
from this that arrays are stored in row-major order (last subscript
varies fastest).
As far as I see, a5_7[5] here is never treated as an array, just as a
pointer, and &a5_7[5][0] is valid.
Segher
On Fri, Nov 06, 2015 at 04:00:08PM -0600, Segher Boessenkool wrote:
> This patch stops combine from generating widening muls of anything else
> but registers (immediates, memory, ...). This probably is a reasonable
> tradeoff for all targets, even those (if any) that have such insns.
>
. Is this okay for trunk?
Segher
2015-11-09 Segher Boessenkool
* gcc/bb-reorder.c (reorder_basic_blocks_simple): Treat a conditional
branch with only one successor just like unconditional branches.
---
gcc/bb-reorder.c | 6 +++---
1 file changed, 3 insertions(+), 3
On Sun, Nov 08, 2015 at 08:21:47PM -0700, Jeff Law wrote:
> On 11/08/2015 08:09 PM, Segher Boessenkool wrote:
> >The code mistakenly thinks any cond_jump has two successors. This is
> >not true if both destinations are the same, as can happen with weird
> >
The testcase used to fail on 64-bit, but it was disabled there.
This patch makes it run there, and beefs up the checking of the
generated code a bit.
Tested on powerpc64-linux *-m32,-m32/-mpowerpc64,-m64).
Is this okay for trunk?
Segher
2015-11-09 Segher Boessenkool
gcc/testsuite
for trunk?
Segher
2015-11-09 Segher Boessenkool
* gcc/simplify-rtx.c (simplify_truncation): Simplify TRUNCATE
of AND of [LA]SHIFTRT.
---
gcc/simplify-rtx.c | 25 +
1 file changed, 25 insertions(+)
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-r
On Mon, Nov 09, 2015 at 08:52:13AM +0100, Uros Bizjak wrote:
> On Sun, Nov 8, 2015 at 9:58 PM, Segher Boessenkool
> wrote:
> > On Fri, Nov 06, 2015 at 04:00:08PM -0600, Segher Boessenkool wrote:
> >> This patch stops combine from generating widening muls of anything el
operand" "")
> + (match_operand:GPR 2 "reg_or_cint_operand" "")))]
You could delete the empty constraint strings while you're at it.
> +;; On machines with modulo support, do a combined div/mod the old fashioned
> +;; method, since the multiply/subtract is faster than doing the mod
> instruction
> +;; after a divide.
You can instead have a "divmod" insn that is split to either of div, mod,
or div+mul+sub depending on which of the outputs is unused. Peepholes
do not get all cases.
This can be a later improvement of course.
Segher
t (match_operand:GPR 0 "gpc_reg_operand" "=r")
> + (ctz:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")))]
> + "TARGET_CTZ"
> + "cnttz %0,%1"
> + [(set_attr "type" "cntlz")])
We should probably rename this attr value now. "cntz" maybe? Could be
later of course.
Segher
("always"). You can write this as
{\mlwa\M} for more sanity.
> +/* { dg-final { scan-assembler-not "sldi "} } */
> +/* { dg-final { scan-assembler-not "sldi\\. " } } */
Similarly {\msldi\M} catches both.
Segher
; + (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS)
> + (use (match_operand:DI 2 "base_reg_operand" "r,r"))
> + (clobber (match_scratch:DI 3 "=X,&b"))]
> + "TARGET_TOC_FUSION_INT"
Do you need that "??r" alternative? Same for the next define_insn.
Big patch, most looks good :-)
Segher
:SI 1 "nonimmediate_operand" "r,Z,r,Z")
> + (match_operand:SI 2 "const_0_to_1_operand" "O,O,n,n")]
> + UNSPEC_IEEE128_MOVE))]
> + "TARGET_FLOAT128_HW"
> + "@
> + mtvsrwa %x0,%1
> + lxsiwax %x0,%y1
> + mtvsrwz %x0,%1
> + lxsiwzx %x0,%y1"
> + [(set_attr "type" "mffgpr,fpload,mffgpr,fpload")])
Tricky, is there no cleaner way to do this?
Segher
z" maybe? Could be
> > later of course.
>
> I don't see a need to add another type attribute for count trailing zeros
> unless count leading zeros has a different timing than count trailing zeros.
I didn't suggest adding a "cnttz"; I suggested renaming "cntlz". Maybe
"ctz" is better, that's what the target flag is as well.
Cheers,
Segher
On Mon, Nov 09, 2015 at 12:27:34PM -0500, Michael Meissner wrote:
> On Mon, Nov 09, 2015 at 10:29:10AM -0600, Segher Boessenkool wrote:
> > On Sun, Nov 08, 2015 at 07:39:14PM -0500, Michael Meissner wrote:
> > > +;; Pretend we have a memory form of extswsli until register allocat
ed for the ADDIS instruction), but it can be used for power9
> fusion (where the ADDIS must be adjancent, but it no longer has to be the
> register being loaded).
If you have only "b", r0 will not be chosen. Does that help? Or are
you generating this pattern from somewhere else where you put in r0?
Segher
On Sun, Nov 08, 2015 at 07:48:56PM -0500, Michael Meissner wrote:
> This patch adds support for the new direct move instructions (MFVSRLD and
> MTVSRDD) that simplify moving 128-bit data between GPRs and vector registers.
You forgot to attach the patch :-)
Segher
On Tue, Nov 10, 2015 at 12:16:09PM +0100, Bernd Schmidt wrote:
> On 11/09/2015 08:33 AM, Segher Boessenkool wrote:
> >If we have
> >
> > (truncate:M1 (and:M2 (lshiftrt:M2 (x:M2) C) C2))
> >
> >we can write it instead as
> >
> > (a
On Mon, Nov 09, 2015 at 03:51:32AM -0600, Segher Boessenkool wrote:
> > >From the original patch submission, it looks that this patch would
> > also benefit x86_32.
>
> Yes, that is what I thought too.
>
> > Regarding the above code size increase - do you perha
,
> including d-form addressing).
>
> Are these patches ok to check in?
You forgot the patch again, it must be a curse ;-)
Segher
On Tue, Nov 10, 2015 at 10:04:30PM +0100, Bernd Schmidt wrote:
> On 11/10/2015 06:44 PM, Segher Boessenkool wrote:
>
> >Yes I know. All the rest of the code around is it like this though.
> >Do you want this written in a saner way?
>
> I won't object to leaving
ed to match this instruction:
> (set (reg:DI 70 [ _2 ])
> (sign_extend:DI (lshiftrt:SI (subreg:SI (reg/v:DI 80 [ x ]) 0)
> (const_int 16 [0x10]
Somehow, before the patch, it decided to do a zero-extension (where the
combined insns had a sign extension). Was that even correct? Maybe
many bits of reg 80 (or, hrm, 81 in the orig?!) are known zero?
Segher
ase, this is some systematic oversight), but we can live
> with it.
After the patch it will no longer combine an imul reg,reg (+ mov) into an
imul mem,reg. _Most_ cases that end up as mem,reg are already expanded
as such, but not all. It's hard to make a smallish testcase.
Segher
d.
It seems that without -flto TOC fusion doesn't do much at all, btw?
Segher
1601 - 1700 of 6091 matches
Mail list logo