--- Comment #50 from rguenth at gcc dot gnu dot org 2010-04-05 14:23
---
(In reply to comment #49)
> At least the tree-vrp.c bit did not get applied (as of trunk r157950)
>
Yup, my fault. I looked at the wrong patch. Thus, the first comment
applies - maybe stage1 with lots of cleanu
--- Comment #49 from steven at gcc dot gnu dot org 2010-04-05 13:01 ---
At least the tree-vrp.c bit did not get applied (as of trunk r157950)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
--- Comment #48 from rguenther at suse dot de 2010-04-05 12:56 ---
Subject: Re: [4.4/4.5 Regression] 50% performance
regression
On Mon, 5 Apr 2010, rguenther at suse dot de wrote:
> --- Comment #47 from rguenther at suse dot de 2010-04-05 12:54 ---
> Subject: Re: [4.4/4.5 R
--- Comment #47 from rguenther at suse dot de 2010-04-05 12:54 ---
Subject: Re: [4.4/4.5 Regression] 50% performance
regression
On Mon, 5 Apr 2010, steven at gcc dot gnu dot org wrote:
> --- Comment #46 from steven at gcc dot gnu dot org 2010-04-05 12:52
> ---
> What happen
--- Comment #46 from steven at gcc dot gnu dot org 2010-04-05 12:52 ---
What happened with the patch of comment #33?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
--
jakub at gcc dot gnu dot org changed:
What|Removed |Added
Target Milestone|4.4.3 |4.4.4
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
--- Comment #45 from rguenth at gcc dot gnu dot org 2009-12-19 21:10
---
(In reply to comment #41)
> Indeed. The PRE issue could be fixed by fixing PR38819 not in the way it is
> done now but "properly" detect the invalid situations during ANTIC computation
> and simply never mark trap
--- Comment #44 from rguenth at gcc dot gnu dot org 2009-12-19 19:41
---
PR42436 now tracks the possible VRP and middle-end improvement. Only the
PRE fixing possibility would count as a regression fix IMHO.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
--- Comment #43 from rguenth at gcc dot gnu dot org 2009-12-19 19:29
---
Btw, with the patch from comment #33 LIM will now hoist the division
properly and the performance regression would be fixed(?). The patch
will though likely cause verification issues with -fnon-call-exceptions
for
--- Comment #42 from rguenth at gcc dot gnu dot org 2009-12-19 11:25
---
Subject: Bug 42108
Author: rguenth
Date: Sat Dec 19 11:24:49 2009
New Revision: 155360
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=155360
Log:
2009-12-19 Richard Guenther
PR tree-optimizatio
--- Comment #41 from rguenth at gcc dot gnu dot org 2009-12-18 23:44
---
Indeed. The PRE issue could be fixed by fixing PR38819 not in the way it is
done now but "properly" detect the invalid situations during ANTIC computation
and simply never mark trapping expressions so. At the cur
--- Comment #40 from matz at gcc dot gnu dot org 2009-12-18 21:40 ---
That's expected. There are three problems and the patch in comment #38 hacks
around only one of them.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
--- Comment #39 from dominiq at lps dot ens dot fr 2009-12-18 21:04 ---
The patch in comment #38 does not fix the speed issue: the code with the inner
loop is still 4 times slower than the code with the loop manually unrolled.
Note that the included test regtests successfully.
--
--- Comment #38 from rguenth at gcc dot gnu dot org 2009-12-18 15:43
---
Created an attachment (id=19346)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19346&action=view)
patch to fix SCCVN issue
This patch fixes the SCCVN issue, I'm giving it more testing.
--
http://gcc.gnu
--- Comment #37 from rguenth at gcc dot gnu dot org 2009-12-15 11:08
---
No, there isn't. I'd simply allow TREE_THIS_NOTRAP on all expression codes
that in principle could. Now of course the middle-end would still need to
make use of this (like transition it to a stmt flag on a tuple)
--- Comment #36 from tkoenig at gcc dot gnu dot org 2009-12-15 07:09
---
If it is any help, code which traps for a do loop is illegal Fortran,
so the compiler may do anything in this case anyway.
Is there a function like
__builtin_i_dont_care_if_this_traps_or_not_if_it_traps_its_the_us
--- Comment #35 from matz at gcc dot gnu dot org 2009-12-14 16:58 ---
Exactly my thinking (growing SCCs -> slow, sorting SCCs -> difficult).
What I thought about the trapping problem is that in this situation we could
ignore the trap test. We start with this situation:
bb1:
goto bb2
--- Comment #34 from rguenth at gcc dot gnu dot org 2009-12-14 12:57
---
"Another possibility is to artificially grow SCCs and
their dependencies by honoring dominating virtual operand uses, not only
defs (ugh)."
what I mean with this is that when finding SCCs we process all uses of a
--- Comment #33 from rguenth at gcc dot gnu dot org 2009-12-14 12:30
---
Created an attachment (id=19289)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19289&action=view)
VRP hack
Hack marking divisions non-trapping during VRP (re-using some stmt bit, not
updating relevant places
--- Comment #32 from rguenther at suse dot de 2009-12-14 12:27 ---
Subject: Re: [4.4/4.5 Regression] 50% performance
regression
On Mon, 14 Dec 2009, matz at gcc dot gnu dot org wrote:
> --- Comment #26 from matz at gcc dot gnu dot org 2009-12-14 04:55 ---
> And if I fix this
--- Comment #31 from rguenth at gcc dot gnu dot org 2009-12-14 11:49
---
Created an attachment (id=19288)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19288&action=view)
another hack
Sorts the SCCs after collecting them all. Breaks most of the PRE/FRE testcases
because it sorts
--- Comment #30 from rguenth at gcc dot gnu dot org 2009-12-14 11:23
---
I fail to see why FRE does not remove the redundant load of *n_9(D). Oh, it
is because we first value-number D.1537_58 = *n_9(D); and only after it
we value-number D.1529_45 = *n_9(D);
This is because while we vi
--- Comment #29 from dominiq at lps dot ens dot fr 2009-12-14 11:21 ---
On x86_64-apple-darwin10, I don't see any speedup with the patch in comment #27
(not a clean bootstrap, but just an incremental build).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
--- Comment #28 from dominiq at lps dot ens dot fr 2009-12-14 10:51 ---
(In reply to comment #27)
> My current collection of patches and hacks for this problem. Obviously the
> "if (0)" in tree-ssa-pre.c will break pr38819 again; apart from that untested,
> hence probably miscompiles ev
--- Comment #27 from matz at gcc dot gnu dot org 2009-12-14 05:25 ---
Created an attachment (id=19287)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19287&action=view)
three hacks
My current collection of patches and hacks for this problem. Obviously the
"if (0)" in tree-ssa-pre.
--- Comment #26 from matz at gcc dot gnu dot org 2009-12-14 04:55 ---
And if I fix this problem (so that only one reference to *n_9) remains
I hit the problem that the fortran frontend emits the computation of countm1
after the loop bound test. No pass is moving code in front of that te
--- Comment #25 from matz at gcc dot gnu dot org 2009-12-13 23:48 ---
The reason that the testcase still is slow (and that the inner loop isn't
unrolled or vectorized) is still the calculation of countm1. The division
therein stays in the second inner loop, whereas with GCC 4.3 it can b
--- Comment #24 from dominiq at lps dot ens dot fr 2009-12-04 14:25 ---
AFAICT fixing pr42131 does not help.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
28 matches
Mail list logo