https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
Wilco changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
Bernd Edlinger changed:
What|Removed |Added
Known to work||8.1.0
--- Comment #67 from Bernd Edling
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
Martin Liška changed:
What|Removed |Added
CC||marxin at gcc dot gnu.org
--- Comment #66
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #65 from Bernd Edlinger ---
Author: edlinger
Date: Wed Sep 6 07:47:52 2017
New Revision: 251752
URL: https://gcc.gnu.org/viewcvs?rev=251752&root=gcc&view=rev
Log:
2017-09-06 Bernd Edlinger
PR target/77308
* confi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #64 from Bernd Edlinger ---
Author: edlinger
Date: Mon Sep 4 15:25:59 2017
New Revision: 251663
URL: https://gcc.gnu.org/viewcvs?rev=251663&root=gcc&view=rev
Log:
2017-09-04 Bernd Edlinger
PR target/77308
* confi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #63 from Bernd Edlinger ---
Author: edlinger
Date: Thu Nov 17 13:47:24 2016
New Revision: 242549
URL: https://gcc.gnu.org/viewcvs?rev=242549&root=gcc&view=rev
Log:
2016-11-17 Bernd Edlinger
PR target/77308
* confi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #62 from Bernd Edlinger ---
Both parts of the patch are now posted for review:
https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00523.html
https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00830.html
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #61 from Bernd Edlinger ---
Created attachment 39958
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39958&action=edit
patch for enabling ldrdstrd peephole
And this is what I will bootstrap in the next cycle.
It will enable all
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
Bernd Edlinger changed:
What|Removed |Added
Attachment #39940|0 |1
is obsolete|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #59 from wilco at gcc dot gnu.org ---
(In reply to Bernd Edlinger from comment #58)
> (In reply to wilco from comment #57)
> > (In reply to Bernd Edlinger from comment #56)
> > > Agreed, I can split the patch.
> > >
> > > From what I
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #57 from wilco at gcc dot gnu.org ---
(In reply to Bernd Edlinger from comment #56)
> (In reply to wilco from comment #55)
> > (In reply to Bernd Edlinger from comment #39)
> > > Created attachment 39940 [details]
> > > proposed patch,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #58 from Bernd Edlinger ---
(In reply to wilco from comment #57)
> (In reply to Bernd Edlinger from comment #56)
> > Agreed, I can split the patch.
> >
> > From what I understand, we should never emit ldrd/strd out of
> > the memmovd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #56 from Bernd Edlinger ---
(In reply to wilco from comment #55)
> (In reply to Bernd Edlinger from comment #39)
> > Created attachment 39940 [details]
> > proposed patch, v2
> >
> > last upload was accidentally truncated.
> > upload
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #55 from wilco at gcc dot gnu.org ---
(In reply to Bernd Edlinger from comment #39)
> Created attachment 39940 [details]
> proposed patch, v2
>
> last upload was accidentally truncated.
> uploaded the right patch.
Right so looking at
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #54 from Bernd Edlinger ---
(In reply to richard.earnshaw from comment #53)
> On 02/11/16 11:57, bernd.edlinger at hotmail dot de wrote:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
> >
> > --- Comment #52 from Bernd Edling
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #53 from richard.earnshaw at arm dot com ---
On 02/11/16 11:57, bernd.edlinger at hotmail dot de wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
>
> --- Comment #52 from Bernd Edlinger ---
> (In reply to wilco from commen
On 02/11/16 11:57, bernd.edlinger at hotmail dot de wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
>
> --- Comment #52 from Bernd Edlinger ---
> (In reply to wilco from comment #51)
>>
>> Indeed, that's the reason behind the existing check. However it disables all
>> profitable bswap
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #52 from Bernd Edlinger ---
(In reply to wilco from comment #51)
>
> Indeed, that's the reason behind the existing check. However it disables all
> profitable bswap cases while still generating unaligned accesses if no bswap
> is nee
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #51 from wilco at gcc dot gnu.org ---
(In reply to Bernd Edlinger from comment #49)
> (In reply to Bernd Edlinger from comment #48)
> > (In reply to wilco from comment #22)
> > >
> > > Anyway, there is another bug: on AArch64 we corre
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #50 from Richard Earnshaw ---
(In reply to wilco from comment #47)
> (In reply to Richard Earnshaw from comment #46)
> > (In reply to wilco from comment #44)
> > > (In reply to Bernd Edlinger from comment #38)
> > > > Created attachme
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #49 from Bernd Edlinger ---
(In reply to Bernd Edlinger from comment #48)
> (In reply to wilco from comment #22)
> >
> > Anyway, there is another bug: on AArch64 we correctly recognize there are 8
> > 1-byte loads, shifts and orrs wh
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #48 from Bernd Edlinger ---
(In reply to wilco from comment #22)
>
> Anyway, there is another bug: on AArch64 we correctly recognize there are 8
> 1-byte loads, shifts and orrs which can be replaced by a single 8-byte load
> and a by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #47 from wilco at gcc dot gnu.org ---
(In reply to Richard Earnshaw from comment #46)
> (In reply to wilco from comment #44)
> > (In reply to Bernd Edlinger from comment #38)
> > > Created attachment 39939 [details]
> > > proposed patc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #45 from Bernd Edlinger ---
(In reply to wilco from comment #44)
> (In reply to Bernd Edlinger from comment #38)
> > Created attachment 39939 [details]
> > proposed patch, v2
> >
>
> > Unlike the previous patch, thumb1 stack usage s
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #46 from Richard Earnshaw ---
(In reply to wilco from comment #44)
> (In reply to Bernd Edlinger from comment #38)
> > Created attachment 39939 [details]
> > proposed patch, v2
> >
>
> > Unlike the previous patch, thumb1 stack usage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #44 from wilco at gcc dot gnu.org ---
(In reply to Bernd Edlinger from comment #38)
> Created attachment 39939 [details]
> proposed patch, v2
>
> Unlike the previous patch, thumb1 stack usage stays at 1588 bytes,
> because thumb1 can
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #43 from Bernd Edlinger ---
(In reply to wilco from comment #41)
>
> ARM only uses the 2nd alternative (set_attr "arch" "any,a,t2,t2"), so this
> is correct. There is no need to support this pattern for ARM as ARM doesn't
> have ORN,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #42 from wilco at gcc dot gnu.org ---
(In reply to Bernd Edlinger from comment #40)
> BTW: I found something strange in this pattern in neon.md:
>
> (define_insn_and_split "orndi3_neon"
> [(set (match_operand:DI 0 "s_register_operan
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #41 from wilco at gcc dot gnu.org ---
(In reply to Bernd Edlinger from comment #40)
> BTW: I found something strange in this pattern in neon.md:
>
> (define_insn_and_split "orndi3_neon"
> [(set (match_operand:DI 0 "s_register_operan
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #40 from Bernd Edlinger ---
BTW: I found something strange in this pattern in neon.md:
(define_insn_and_split "orndi3_neon"
[(set (match_operand:DI 0 "s_register_operand" "=w,?&r,?&r,?&r")
(ior:DI (not:DI (match_operand:DI
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
Bernd Edlinger changed:
What|Removed |Added
Attachment #39939|0 |1
is obsolete|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
Bernd Edlinger changed:
What|Removed |Added
Attachment #39898|0 |1
is obsolete|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #37 from Richard Earnshaw ---
(In reply to Bernd Edlinger from comment #34)
> (In reply to Richard Earnshaw from comment #33)
> > The logic is certainly strange. Some cores run LDRD less quickly than they
> > can do LDM, or even two
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #36 from wilco at gcc dot gnu.org ---
(In reply to Bernd Edlinger from comment #34)
> (In reply to Richard Earnshaw from comment #33)
> > (In reply to Wilco from comment #32)
> > > (In reply to Bernd Edlinger from comment #31)
> > > >
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #35 from wilco at gcc dot gnu.org ---
(In reply to Richard Earnshaw from comment #30)
> (In reply to wilco from comment #29)
> > Combine could help with
> > merging 2 loads/stores into a single instruction.
>
> No, combine works stri
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #34 from Bernd Edlinger ---
(In reply to Richard Earnshaw from comment #33)
> (In reply to Wilco from comment #32)
> > (In reply to Bernd Edlinger from comment #31)
> > > Furthermore, if I want to do -Os the third condition is FALSE t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #33 from Richard Earnshaw ---
(In reply to Wilco from comment #32)
> (In reply to Bernd Edlinger from comment #31)
> > Furthermore, if I want to do -Os the third condition is FALSE too.
> > But one ldrd must be shorter than two ldr ?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #32 from Wilco ---
(In reply to Bernd Edlinger from comment #31)
> Sure, combine cant help, especially because it runs before split1.
>
> But I wondered why this peephole2 is not enabled:
>
> (define_peephole2 ; ldrd
> [(set (matc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #31 from Bernd Edlinger ---
Sure, combine cant help, especially because it runs before split1.
But I wondered why this peephole2 is not enabled:
(define_peephole2 ; ldrd
[(set (match_operand:SI 0 "arm_general_register_operand" "")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #30 from Richard Earnshaw ---
(In reply to wilco from comment #29)
> Combine could help with
> merging 2 loads/stores into a single instruction.
No, combine works strictly on dataflow dependencies. Two stores cannot be
dataflow rel
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #29 from wilco at gcc dot gnu.org ---
(In reply to Bernd Edlinger from comment #28)
> With my latest patch I bootstrapped a configuration with
> --with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16
> --with-float=hard
>
> I
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #28 from Bernd Edlinger ---
With my latest patch I bootstrapped a configuration with
--with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16
--with-float=hard
I noticed a single regression in gcc.target/arm/pr53447-*.c
That i
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #27 from wilco at gcc dot gnu.org ---
(In reply to Bernd Edlinger from comment #26)
> (In reply to wilco from comment #25)
> >
> > Alternatives can be disabled, there are flags, eg:
> >
> > (set_attr "arch" "neon_for_64bits,*,*,avoid
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #26 from Bernd Edlinger ---
(In reply to wilco from comment #25)
>
> Alternatives can be disabled, there are flags, eg:
>
> (set_attr "arch" "neon_for_64bits,*,*,avoid_neon_for_64bits")
>
Ok I see, thanks.
Still lots of insns cou
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #25 from wilco at gcc dot gnu.org ---
(In reply to Bernd Edlinger from comment #24)
> (In reply to Bernd Edlinger from comment #23)
> > @@ -5020,7 +5020,7 @@
> > (define_insn_and_split "one_cmpldi2"
> >[(set (match_operand:DI 0 "s
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #24 from Bernd Edlinger ---
(In reply to Bernd Edlinger from comment #23)
> @@ -5020,7 +5020,7 @@
> (define_insn_and_split "one_cmpldi2"
>[(set (match_operand:DI 0 "s_register_operand" "=w,&r,&r,?w")
> (not:DI (match_o
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #23 from Bernd Edlinger ---
(In reply to wilco from comment #22)
>
> What I meant is that your patch still makes a large difference on the
> original test case despite making no difference in simple cases like the
> above.
For sure
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #22 from wilco at gcc dot gnu.org ---
(In reply to Bernd Edlinger from comment #21)
> (In reply to wilco from comment #20)
> > > Wilco, where have you seen the additional registers used with my
> > > previous patch, maybe we can try to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #21 from Bernd Edlinger ---
(In reply to wilco from comment #20)
> > Wilco, where have you seen the additional registers used with my
> > previous patch, maybe we can try to fix that somehow?
>
> What happens is that the move of zero
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #20 from wilco at gcc dot gnu.org ---
(In reply to Bernd Edlinger from comment #19)
> I think the problem with anddi iordi and xordi instructions is that
> they obscure the data flow between low and high half words.
> When they are not
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #19 from Bernd Edlinger ---
I think the problem with anddi iordi and xordi instructions is that
they obscure the data flow between low and high half words.
When they are not enabled, we have the low and high parts
expanded independent
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #18 from Bernd Edlinger ---
Created attachment 39898
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39898&action=edit
proposed patch
This disables problematic di patterns when no fpu is used, and
there is absolutely no chance t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
wilco at gcc dot gnu.org changed:
What|Removed |Added
CC||wilco at gcc dot gnu.org
--- C
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #16 from Bernd Edlinger ---
Wow.
look at this:
Index: arm.md
===
--- arm.md (revision 241539)
+++ arm.md (working copy)
@@ -448,7 +448,7 @@
(plus:DI
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #15 from Bernd Edlinger ---
(In reply to Wilco from comment #14)
> (In reply to Bernd Edlinger from comment #13)
> > I am still trying to understand why thumb1 seems to outperform thumb2.
> >
> > Obviously thumb1 does not have the sh
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #14 from Wilco ---
(In reply to Bernd Edlinger from comment #13)
> I am still trying to understand why thumb1 seems to outperform thumb2.
>
> Obviously thumb1 does not have the shiftdi3 pattern,
> but even if I remove these from thum
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #13 from Bernd Edlinger ---
I am still trying to understand why thumb1 seems to outperform thumb2.
Obviously thumb1 does not have the shiftdi3 pattern,
but even if I remove these from thumb2, the result is still
not par with thumb2.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #12 from Wilco ---
It looks like we need a different approach, I've seen the extra SETs use up
more registers in some cases, and in other cases being optimized away early
on...
Doing shift expansion at the same time as all other DI
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #11 from Bernd Edlinger ---
Author: edlinger
Date: Mon Oct 17 17:46:59 2016
New Revision: 241273
URL: https://gcc.gnu.org/viewcvs?rev=241273&root=gcc&view=rev
Log:
2016-10-17 Bernd Edlinger
PR target/77308
* confi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
Wilco changed:
What|Removed |Added
CC||wdijkstr at arm dot com
--- Comment #10 from Wil
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
ktkachov at gcc dot gnu.org changed:
What|Removed |Added
Status|UNCONFIRMED |NEW
Last reconfirmed|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #8 from Bernd Edlinger ---
analyzing the different thumb1/2 reload dumps,
I see t2 often uses code like that to access spill slots:
(insn 11576 8090 9941 5 (set (reg:SI 3 r3 [11890])
(plus:SI (reg/f:SI 13 sp)
(con
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #7 from Bernd Edlinger ---
even more surprisingly is that:
While thumb2 code (-march=armv6t2 -mthumb) has about the same stack size
as arm code (-marm), thumb1 code has only 1588 bytes stack, and it does
not change with -fno-schedule
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
Bernd Edlinger changed:
What|Removed |Added
CC||vmakarov at gcc dot gnu.org
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #5 from Bernd Edlinger ---
Now I try to clear the out register when the shift < 32
Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c(revision 239624)
+++
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #4 from Bernd Edlinger ---
hmm, when I compare aarch64 vs. arm sha512.c.260r.reload
with -O3 -fno-schedule-insns
I see a big difference:
aarch64 has only few spill regs
subreg regs:
Slot 0 regnos (width = 8): 856
Slot 1 reg
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
Bernd Edlinger changed:
What|Removed |Added
CC||bernd.edlinger at hotmail dot
de
--- C
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #2 from Andrew Pinski ---
For aarch64, the stack size is just 208 bytes.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #1 from Andrew Pinski ---
Does -fno-schedule-insns help? Sometimes the scheduler before the register
allocator causes register pressure and forces more register spills.
69 matches
Mail list logo