0, 1
sxtwx0, w0
eor x1, x0, x0, asr 63
sub x1, x1, x0, asr 63
mov x0, x1
ret
After:
addsw0, w0, 1
csneg w0, w0, w0, pl
ret
ChangeLog:
2015-03-03 Wilco Dijkstra
* gcc/config/aarch64/aarch64.md (absdi2): opti
This patch makes aarch64_min_divisions_for_recip_mul configurable for float and
double. This allows
CPUs with really fast or multiple dividers to return 3 (or even 4) if that
happens to be faster
overall. No code generation change - bootstrap & regression OK.
ChangeLog:
2015-03-03 W
> Andrew Pinski wrote:
> On Tue, Mar 3, 2015 at 10:06 AM, Wilco Dijkstra wrote:
> > This patch makes aarch64_min_divisions_for_recip_mul configurable for float
> > and double. This
> allows
> > CPUs with really fast or multiple dividers to return 3 (or even 4) if that
> Maxim Kuvyrkov wrote:
>
> You are removing the 2nd alternative that generates "abs" with your patch.
> While I agree that
> using "csneg" is faster on all implementations, can you say the same for
> "abs"? Especially
> given the fact that csneg requires 4 operands instead of abs'es 2?
Yes, g
> Maxim Kuvyrkov wrote:
> > On Mar 4, 2015, at 3:30 PM, Wilco Dijkstra wrote:
> >
> >> Maxim Kuvyrkov wrote:
> >>
> >> You are removing the 2nd alternative that generates "abs" with your patch.
> >> While I agree
> that
> >
Include the cost of op0 and op1 in all cases in PLUS and MINUS in
aarch64_rtx_costs.
Bootstrap & regression OK.
ChangeLog:
2015-03-04 Wilco Dijkstra
* gcc/config/aarch64/aarch64.c (aarch64_rtx_costs):
Calculate cost of op0 and op1 in PLUS and MINUS cases.
---
gcc/co
> Jeff Law wrote:
> On 02/26/15 10:30, Wilco Dijkstra wrote:
> > Several GCC versions ago a conditional negate optimization was introduced
> > as a workaround
> for
> > PR45685. However the branchless expansion for conditional negate is
> > extremely ine
This patch fixes the shift costs for Cortex-A53 so they are more accurate -
immediate shifts use
SBFM/UBFM which takes 2 cycles, register controlled shifts take 1 cycle.
Bootstrap and regression
OK.
ChangeLog:
2015-03-05 Wilco Dijkstra
* gcc/config/arm/aarch-cost-tables.h
> So, OK with the testcase moved into gcc.target/i386/
I've moved it and changed the compile condition:
/* { dg-do compile { target { ! { ia32 } } } } */
Jiong, can you commit this please?
Wilco
2015-03-06 Wilco Dijkstra
* gcc/tree-ssa-phiopt.c (neg_replacement): Remove.
Wilco Dijkstra
* gcc/config/aarch64/aarch64-protos.h (tune-params):
Add reasociation tuning parameters.
* gcc/config/aarch64/aarch64.c (TARGET_SCHED_REASSOCIATION_WIDTH):
Define. (aarch64_reassociation_width): New function.
(generic_tunings) Add reassociation
,
however it is right thing to do for any constant, including constants in
literal pools (which are
typically not legitimate). Also use ALL_REGS rather than GENERAL_REGS as
ALL_REGS has the correct
floating point register costs.
ChangeLog:
2014-10-29 Wilco Dijkstra
* gcc/ira-costs.c
is to disable lrint/llrint on double if the size of a long is
smaller (ie. ilp32).
Passes regress and bootstrap on AArch64. OK for commit?
ChangeLog
2018-11-13 Wilco Dijkstra
gcc/
PR target/81800
* gcc/config/aarch64/aarch64.md (lrint): Disable lrint pattern i
, OK for commit?
ChangeLog:
2018-11-09 Wilco Dijkstra
gcc/
* config/aarch64/aarch64.c (aarch64_classify_symbol):
Apply reasonable limit to symbol offsets.
testsuite/
* gcc.target/aarch64/symbol-range.c (foo): Set new limit.
* gcc.target/aarch64/symbol-r
on AArch64 and x86-64. Inspected x86, Arm,
Thumb-1 and Thumb-2 assembler which looks correct.
ChangeLog:
2018-12-07 Wilco Dijkstra
gcc/
PR middle-end/64242
* builtins.c (expand_builtin_longjmp): Add frame clobbers and schedule
block.
(expand_builtin_nonlocal_
seems incorrect since the helper
function moves the the frame pointer value into the static chain register
(so this patch does nothing to make it better or worse).
AArch64 bootstrap OK, new test passes on AArch64, x86-64 and Arm.
ChangeLog:
2018-12-13 Wilco Dijkstra
gcc/
PR middle-end/
Fix the alignment option parser to always allow up to 4 alignments.
Now -falign-functions=16:8:8:8 no longer reports an error.
OK for commit (and backport to GCC9)?
ChangeLog:
2019-05-30 Wilco Dijkstra
PR driver/90684
* gcc/opts.c (parse_and_check_align_values): Allow 4
With -mcpu=generic the function alignment is currently 8, however almost all
supported cores prefer 16 or higher, so increase the default to 16:12.
This gives ~0.2% performance increase on SPECINT2017, while codesize is 0.12%
larger.
ChangeLog:
2019-05-31 Wilco Dijkstra
* config
Hi Steve,
> I have no objection to the change but could the commit message and/or
> comments be expanded to explain the ':12' part of this value. I
> couldn't find an explanation for it in the code and I don't understand
> what it does.
See
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.ht
Hi Joel,
A few comments below:
+/* If X is a positive CONST_DOUBLE with a value that is the reciprocal of a
+ power of 2 (i.e 1/2^n) return the number of float bits. e.g. for x==(1/2^n)
+ return log2 (n). Otherwise return 0. */
+int
+aarch64_fpconst_pow2_recip (rtx x)
+{
+ REAL_VALUE_TYPE r
Hi Sylvia,
-(define_insn "ldr_got_tiny"
+(define_insn "ldr_got_tiny_di"
[(set (match_operand:DI 0 "register_operand" "=r")
- (unspec:DI [(match_operand:DI 1 "aarch64_valid_symref" "S")]
- UNSPEC_GOTTINYPIC))]
+ (unspec:DI
+ [(match_operand:DI 1 "aarch64_vali
Clear the input array to avoid the testcase accidentally
passing with an incorrect frame pointer.
Committed as obvious.
ChangeLog:
2019-06-17 Wilco Dijkstra
testsuite/
PR middle-end/64242
* gcc.c-torture/execute/pr64242.c: Improve test.
--
diff --git a/gcc/testsuite/gcc.c
Hi Jeff,
> So I like the significant simplification here. My worry is whether or
> not this is, in effect, an ABI change. ie, would we be able to mix and
> match .o files from before/after this change which used the builtin
> setjmp/longjmp bits?
No it's not an ABI change. It does affect the va
Hi,
And a few more comments:
> +/* If X is a positive CONST_DOUBLE with a value that is the reciprocal of a
> + power of 2 (i.e 1/2^n) return the number of float bits. e.g. for
> x==(1/2^n)
> + return n. Otherwise return -1. */
> +int
> +aarch64_fpconst_pow2_recip (rtx x)
> +{
> +
Hi Max,
> The testcase from the patch passes with the trunk xtensa-linux-gcc
> with windowed ABI. But with the changes in this patch a lot of tests
> that use longjmp are failing on xtensa-linux.
Interesting. I looked at the _xtensa_nonlocal_goto implementation in
libgcc/config/xtensa/lib2funcs.
Hi,
> > Is this test valid? Can jmp buffer be allowed on stack?
>
> Sure, the contents of the jmp buffer is only valid during the lifetime
> of the call frame anyway.
Indeed. The issue with jmp buffer being on the stack causing incorrect
restore when doing longjmp has just been fixed (PR64242)
Hi Max,
> It would work if a frame pointer was initialized in the function test, but
> it wasn't:
Right, because it unwinds, it needs a valid frame pointer since we no
longer store the stack pointer. So xtensa_frame_pointer_required
should do something like:
if (cfun->machine->accesses_prev_fr
Hi Max,
> On Tue, Jun 18, 2019 at 4:53 PM Wilco Dijkstra wrote:
> > > It would work if a frame pointer was initialized in the function test, but
> > > it wasn't:
> >
> > Right, because it unwinds, it needs a valid frame pointer since we no
Hi Ayan,
Have you seen https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481?
Adding support for a generic bitreverse builtin would be very useful
since LLVM already supports this.
Wilco
Hi,
Florian wrote:
> For userland, I would like to eventually copy the OpenBSD approach for
> architectures which have some form of PC-relative addressing: we can
> have multiple random canaries in (RELRO) .rodata in sufficiently close
> to the code that needs them (assuming that we have split .ro
Log:
2018-12-07 Wilco Dijkstra
gcc/
PR middle-end/64242
* builtins.c (expand_builtin_longjmp): Add frame clobbers and schedule
block.
(expand_builtin_nonlocal_goto): Likewise.
testsuite/
PR middle-end/64242
* gcc.c-torture/execute/pr64242.c: Update test.
--
Hi,
Jakub Jelinek wrote:
> On Fri, Dec 07, 2018 at 02:52:48PM +0000, Wilco Dijkstra wrote:
>> - struct __attribute__((aligned (32))) S { int a[4]; } s;
>>
Hi,
Jakub Jelinek wrote:
On Fri, Dec 07, 2018 at 04:19:22PM +, Wilco Dijkstra wrote:
>> The test case doesn't need an aligned object to fail, so why did you add it?
>
> It needed it on i686, because otherwise it happened to see the value it
> wanted in the caller's
Hi,
>> Ultimately, the best solution here will probably depend on which we
>> think is more likely, copysign or the example I give above.
> I'd tend to suspect we'd see more pure integer bit twiddling than the
> copysign stuff.
All we need to do is to clearly separate the integer and FP/SIMD case
Hi Oliver,
+#define FIXED_R18 0
{ \
0, 0, 0, 0, 0, 0, 0, 0, /* R0 - R7 */ \
0, 0, 0, 0, 0, 0, 0, 0, /* R8 - R15 */ \
- 0, 0, 0, 0, 0, 0, 0, 0, /* R16 - R23 */ \
+ 0, 0, FIXED_R18, 0, 0, 0, 0, 0, /* R16 - R23 */
Hi,
>> On 12 Dec 2018, at 18:21, Richard Earnshaw (lists)
>> wrote:
>
>> However, that introduces an issue that that
>> code is potentially used across multiple versions of gcc, with
>> potentially different choices of the static chain register. Hmm, this
>> might need some more careful though
Hi Martin,
> Does a non-executable stack actually improve security?
Absolutely, it's like closing your front door rather than just leave it open
for anyone.
> For the alternative implementation using (custom) function
> descriptors (-fno-trampolines) the static chain becomes
> part of the ABI or
Hi Martin,
Uecker, Martin wrote:
>Am Mittwoch, den 12.12.2018, 22:04 + schrieb Wilco Dijkstra:
>> Hi Martin,
>>
>> > Does a non-executable stack actually improve security?
>>
>> Absolutely, it's like closing your front door rather than just leave i
Hi Martin,
> One could also argue that it creates a false sense of security
> and diverts resources from properly fixing the real problems
> i.e. the buffer overflows which lets an attacker write to the
> stack in the first place. A program without buffer overflows
> is secure even without an exec
seems incorrect since the helper
function moves the the frame pointer value into the static chain register
(so this patch does nothing to make it better or worse).
AArch64 bootstrap OK, new test passes on AArch64, x86-64 and Arm.
ChangeLog:
2018-12-13 Wilco Dijkstra
gcc/
PR middle-end/
Hi Hans-Peter,
> While the choice of static-chain register does not affect the
> ABI, it's the other way round: the choice of static-chain
> register matters, specifically it's call-clobberedness.
Agreed.
> It looks like the current aarch64 static-chain register R18 is
> call-saved but without s
Hi,
Jakub Jelinek wrote:
> On Wed, Dec 19, 2018 at 07:53:48PM +, Uecker, Martin wrote:
>> What do you think about making the trampoline a single call
>> instruction and have a large memory region which is the same
>> page mapped many times?
This sounds like a good idea, but given a function d
Hi Martin,
> There is a similar mechanism for pointer-to-member-functions
> used by C++. Is this correct on aarch64?
/* By default, the C++ compiler will use the lowest bit of the pointer
to function to indicate a pointer-to-member-function points to a
virtual member function. However, if
Hi Olivier,
> I'm experimenting with the idea of adjusting the
> stack probing code using r9 today, to see if it could
> save/restore that reg if it happens to be the static chain
> as well.
>
> If that can be made to work, maybe that would be a better
> alternative than just swapping and have the
Hi Sam,
This is a trivial test fix, so it falls under the obvious rule and can be
committed without approval - https://www.gnu.org/software/gcc/svnwrite.html
Cheers,
Wilco
Hi Richard,
> I think this should be "lk*r", not "l*rk". SP is only going to crop up
> in rare circumstances, but we are always going to need this pattern if
> it does and hiding this from register preferencing is pointless. It's
> not like the compiler is going to start allocating SP in the
ortex-a57
ChangeLog:
2019-07-29 Wilco Dijkstra
* config/arm/arm.c (arm_option_override): Don't override sched
pressure algorithm.
--
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index
81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515
Hi all,
>On 30/07/2019 10:31, Ramana Radhakrishnan wrote:
>> On 30/07/2019 10:08, Christophe Lyon wrote:
>>> Hi Wilco,
>>>
>>> Do you know which benchmarks were used when this was checked-in?
>>> It isn't clear from
>>> https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html
>>
>> It
K on arm-none-linux-gnueabihf --with-cpu=cortex-a57,
committed as obvious.
[1] https://gcc.gnu.org/ml/gcc-patches/2019-07/msg01579.html
ChangeLog:
2019-07-30 Wilco Dijkstra
* config/arm/thumb2.md (thumb2_movsi_insn): Adjust literal offset.
* config/arm/vfp.md (thumb2_movsi
7-18 Wilco Dijkstra
* config/arm/arm.md (split and/eor/ior): Remove Neon check.
(split not): Add DImode not splitter.
(anddi3): Remove pattern.
(anddi3_insn): Likewise.
(anddi_zesidi_di): Likewise.
(anddi_sesdi_di): Likewise.
(anddi_notd
01301.html
ChangeLog:
2019-07-19 Wilco Dijkstra
* config/arm/iterators.md (qhs_extenddi_cstr): Update.
(qhs_extenddi_cstr): Likewise.
* config/arm/arm.md (ashldi3): Always expand early.
(ashlsi3): Likewise.
(ashrsi3): Likewise.
(zero_extenddi2): R
removed.
Code generation is improved in all cases, saving another 400-500 instructions
from the PR77308 testcase (total improvement is over 1700 instructions with
-mcpu=cortex-a57 -O2).
Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57
ChangeLog:
2019-07-19 W
ping
With -mcpu=generic the function alignment is currently 8, however almost all
supported cores prefer 16 or higher, so increase the default to 16:12.
This gives ~0.2% performance increase on SPECINT2017, while codesize is 0.12%
larger.
ChangeLog:
2019-05-31 Wilco Dijkstra
ch64, passes regress, OK for commit?
ChangeLog:
2018-11-09 Wilco Dijkstra
gcc/
* config/aarch64/aarch64.c (aarch64_classify_symbol):
Apply reasonable limit to symbol offsets.
testsuite/
* gcc.target/aarch64/symbol-range.c (foo): Set new l
The fix is to disable lrint/llrint on double if the size of a long is
smaller (ie. ilp32).
Passes regress and bootstrap on AArch64. OK for commit?
ChangeLog
2018-11-13 Wilco Dijkstra
gcc/
PR target/81800
* gcc/config/aarch64/aarch64.md (lrint): Disable
Fix pr89330_0.C test by adding missing effective target shared.
Committed as obvious.
ChangeLog:
2019-08-01 Wilco Dijkstra
* gcc/testsuite/g++.dg/lto/pr89330_0.C: Add effective-target shared.
--
diff --git a/gcc/testsuite/g++.dg/lto/pr89330_0.C
b/gcc/testsuite/g++.dg/lto/pr89330_0.C
Add simplifications for popcount (x) > 1 to (x & (x-1)) != 0 and
popcount (x) == 1 into (x-1)
gcc/
PR middle-end/90693
* match.pd: Add popcount simplifications.
testsuite/
PR middle-end/90693
* gcc.dg/fold-popcount-5.c: Add new test.
---
diff --git a/gcc/match.p
hf --with-cpu=cortex-a57
ChangeLog:
2019-07-29 Wilco Dijkstra
* config/arm/arm.c (arm_option_override): Don't override sched
pressure algorithm.
--
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index
81286cadf32f908e045d704128
nces.
Bootstrapped on AArch64, passes regress, OK for commit?
ChangeLog:
2018-11-09 Wilco Dijkstra
gcc/
* config/aarch64/aarch64.c (aarch64_classify_symbol):
Apply reasonable limit to symbol offsets.
testsuite/
* gcc.target/aarch64/symbol-range.c
mmit?
ChangeLog:
2019-07-18 Wilco Dijkstra
* config/arm/arm.md (split and/eor/ior): Remove Neon check.
(split not): Add DImode not splitter.
(anddi3): Remove pattern.
(anddi3_insn): Likewise.
(anddi_zesidi_di): Likewise.
(anddi_sesd
org/ml/gcc-patches/2019-07/msg01301.html
ChangeLog:
2019-07-19 Wilco Dijkstra
* config/arm/iterators.md (qhs_extenddi_cstr): Update.
(qhs_extenddi_cstr): Likewise.
* config/arm/arm.md (ashldi3): Always expand early.
(ashlsi3): Likewise.
eLog:
2019-07-19 Wilco Dijkstra
* config/arm/arm.md (neon_for_64bits): Remove.
(avoid_neon_for_64bits): Remove.
(arm_adddi3): Always split early.
(arm_subdi3): Always split early.
(negdi2): Remove Neon expansion.
(split zero_ex
Hi Richard,
> >
> > I think this should be in expand stage where there could be comparison
> > of the cost of the RTLs.
>
> I tend to agree here, if not then for the reason the "simplified" variants
> have more GIMPLE stmts which means they are not "simpler". In
> fact I'd argue for canonicaliza
-08-22 Wilco Dijkstra
* gcc/config/arm/arm.opt (mneon-for-64bits): Deprecate.
* gcc/config/arm/arm.h (TARGET_PREFER_NEON_64BITS): Remove.
(prefer_neon_for_64bits): Remove.
* gcc/config/arm/arm.c (prefer_neon_for_64bits): Remove.
(tune_params): Remove
s option is deprecated and has no effect.
@item -mslow-flash-data
@opindex mslow-flash-data
Updated patch:
Deprecate -mneon-for-64bits since it no longer has any effect after
the DImode codegen improvements.
OK for commit?
ChangeLog:
2019-08-23 Wilco Dijkstra
* gcc/doc/invoke.
Hi Christophe,
> After this was committed (r274823), I've noticed 2 regressions on arm*:
> FAIL: gcc.target/arm/pr53447-5.c scan-assembler-times (ldrd|vldr\\.64) 20
> FAIL: gcc.target/arm/pr53447-5.c scan-assembler-times (strd|vstr\\.64) 18
>
> Does this test still pass for you?
You're right, t
memory operands and immediates are handled more efficiently.
Bootstrap OK on armhf, regress passes.
ChangeLog:
2019-08-29 Wilco Dijkstra
* config/arm/arm.md (anddi3): Expand explicitly.
(iordi3): Likewise.
(xordi3): Likewise.
(one_cmpldi2): Likewise
Hi Maxim,
> It appears that cores with autoprefetcher hardware prefer loads and stores
>bundled together, not interspersed with > other instructions to occupy the
>rest of CPU units.
I don't believe it is as simple as that - modern cores have multiple
prefetchers but
won't prefer bund
Hi Alexander,
> So essentially the main issue is not a hardware peculiarity, but rather the
> bad schedule being totally wrong (it could only make sense if loads had
> 1-cycle
> latency, which they do not).
The scheduling is only bad because the specific intrinsics used are mapped
onto asm stat
nces.
Bootstrapped on AArch64, passes regress, OK for commit?
ChangeLog:
2018-11-09 Wilco Dijkstra
gcc/
* config/aarch64/aarch64.c (aarch64_classify_symbol):
Apply reasonable limit to symbol offsets.
testsuite/
* gcc.target/aar
one-linux-gnueabihf --with-cpu=cortex-a57
ChangeLog:
2019-07-29 Wilco Dijkstra
* config/arm/arm.c (arm_option_override): Don't override sched
pressure algorithm.
--
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index
81286cadf32f908e045d704128
Remove various MULS/MLAS patterns which are enabled when optimizing for
size. However the codesize gain from these patterns is so minimal that
there is no point in keeping them.
Bootstrap OK on armhf, regress passes.
ChangeLog:
2019-09-03 Wilco Dijkstra
* config/arm/arm.md
Cleanup the 32-bit multiply patterns. Merge the pre-Armv6 with the Armv6
patterns, remove useless alternatives and order the accumulator operands
to prefer MLA Ra, Rb, Rc, Ra whenever feasible.
Bootstrap OK on armhf, regress passes.
ChangeLog:
2019-09-03 Wilco Dijkstra
* config/arm
Cleanup the various highpart multiply patterns using iterators.
As a result the signed and unsigned variants and the pre-Armv6
multiply operand constraints are all handled in a single pattern
and simple expander.
Bootstrap OK on armhf, regress passes.
ChangeLog:
2019-09-03 Wilco Dijkstra
other DImode operations splitting early.
Bootstrap OK on armhf, regress passes.
ChangeLog:
2019-09-03 Wilco Dijkstra
* config/arm/arm.md (maddsidi4): Remove expander.
(mulsidi3adddi): Remove pattern.
(mulsidi3adddi_v6): Likewise.
(mulsidi3_nov6): Likewise
Hi Maxim,
> > Autoprefetching heuristic is enabled only for cores that support it, and
> isn't active for by default.
>
> It's enabled on most cores, including the default (generic). So we do have to
> be
> careful that this doesn't regress any other benchmarks or do worse on modern
> cores
nce llrint now also ignores FE_INVALID exceptions!
The fix is to disable lrint/llrint on double if the size of a long is
smaller (ie. ilp32).
ChangeLog
2018-11-13 Wilco Dijkstra
gcc/
PR target/81800
* gcc/config/aarch64/aarch64.md (lrint): Disable lrint pattern if GPF
Hi Richard,
>What I have not done, but is now a possibility, is to use a custom
>calling convention for the out-of-line routines. I now only clobber
>2 (or 3, for TImode) temp regs and set a return value.
This would be a great feature to have since it reduces the overhead of
outlinin
Hi,
+(simplify
+ (convert
+(rshift
+ (mult
> is the outer convert really necessary? That is, if we change
> the simplification result to
Indeed that should be "convert?" to make it optional.
> Is the Hamming weight popcount
> faster than the libgcc table-based approach? I wonder if
PECFP improves 0.2%.
Bootstrap OK, OK for commit?
ChangeLog:
2019-09-09 Wilco Dijkstra
* config/arm/arm.c (arm_legitimize_address): Remove Thumb-2 bailout.
--
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index
a5a6a0fab1b4b7ef07931522e7d47e59842
?
ChangeLog:
2019-09-09 Wilco Dijkstra
* config/arm/arm.h (HONOR_REG_ALLOC_ORDER): Set when optimizing for
size.
--
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index
8d023389eec469ad9c8a4e88edebdad5f3c23769..e3473e29fbbb964ff1136c226fbe30d35dbf7b39
100644
--- a/gcc/config/arm
ess OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57
ChangeLog:
2019-07-29 Wilco Dijkstra
* config/arm/arm.c (arm_option_override): Don't override sched
pressure algorithm.
--
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/
which ensure
memory operands and immediates are handled more efficiently.
Bootstrap OK on armhf, regress passes.
ChangeLog:
2019-08-29 Wilco Dijkstra
* config/arm/arm.md (anddi3): Expand explicitly.
(iordi3): Likewise.
(xordi3): Likewise.
(one_cmpldi2
and its references.
Bootstrapped on AArch64, passes regress, OK for commit?
ChangeLog:
2018-11-09 Wilco Dijkstra
gcc/
* config/aarch64/aarch64.c (aarch64_classify_symbol):
Apply reasonable limit to symbol offsets.
tests
ping
Remove various MULS/MLAS patterns which are enabled when optimizing for
size. However the codesize gain from these patterns is so minimal that
there is no point in keeping them.
Bootstrap OK on armhf, regress passes.
ChangeLog:
2019-09-03 Wilco Dijkstra
* config
ping
Cleanup the 32-bit multiply patterns. Merge the pre-Armv6 with the Armv6
patterns, remove useless alternatives and order the accumulator operands
to prefer MLA Ra, Rb, Rc, Ra whenever feasible.
Bootstrap OK on armhf, regress passes.
ChangeLog:
2019-09-03 Wilco Dijkstra
Wilco Dijkstra
* config/arm/arm.md (smulsi3_highpart): Use and iterators.
(smulsi3_highpart_nov6): Remove pattern.
(smulsi3_highpart_v6): Likewise.
(umulsi3_highpart): Likewise.
(umulsi3_highpart_nov6): Likewise.
(umulsi3_highpart_v6
subreg issues due to other DImode operations splitting early.
Bootstrap OK on armhf, regress passes.
ChangeLog:
2019-09-03 Wilco Dijkstra
* config/arm/arm.md (maddsidi4): Remove expander.
(mulsidi3adddi): Remove pattern.
(mulsidi3adddi_v6): Likewise
or commit?
ChangeLog:
2019-09-11 Wilco Dijkstra
* config/arm/arm.h (SLOW_BYTE_ACCESS): Set to 1.
--
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index
8b92c830de09a3ad49420fdfacde02d8efc2a89b..11212d988a0f56299c2266bace80170d074be56c
100644
--- a/gcc/config/arm/arm.h
While code hoisting generally improves codesize, it can affect performance
negatively. Benchmarking shows it doesn't help SPEC and negatively affects
embedded benchmarks, so only enable code hoisting with -Os on Arm.
Bootstrap OK, OK for commit?
ChangeLog:
2019-09-11 Wilco Dij
Hi Jeff,
Jeff wrote:
> Just to make sure I understand. Are you saying the addresses for the
> MEMs are equal or the contents of the memory location are equal.
>
> For the former the alignment has to be the same, plain and simple, even
> if GCC isn't aware the alignments have to be the same.
>
> F
Hi Paul,
> > On Sep 11, 2019, at 11:48 AM, Wilco Dijkstra wrote:
> >
> > Contrary to all documentation, SLOW_BYTE_ACCESS simply means accessing
> > bitfields by their declared type, which results in better codegeneration
> > on practically any target. So set it
Hi Jeff,
> We're talking about two instructions where if the first executes, then
> the second also executes. If the memory addresses are the same, then
> their alignment is the same.
>
> In your case the two instructions are on different execution paths and
> are in fact mutually exclusive.
S
Hi Richard,
> Do we document target specific deviations from "default" behavior somewhere?
Not as far as I know. The other option changes in arm-common.c are not mentioned
anywhere, neither is any of arm_option_override_internal.
If we want to keep documentation useful, we shouldn't clutter the
Hi Richard,
>> So what is the behaviour when you explicitly select a specific CPU?
>
> Selecting a specific cpu selects the specific architecture that the cpu
> supports, does it not? Thus the architecture example above still applies.
>
> Unless I don't understand what distinction that you're mak
Hi Prathamesh,
> My only concern with the patch is that the issue isn't specific to
> code-hoisting.
> For this particular case (reproducible with pr77445-2.c), disabling
> jump threading
> doesn't cause the register spill with hoisting enabled.
> Likewise disabling forwprop3 and forwprop4 prevent
Hi Kyrill,
>> When you select a CPU the goal is that we optimize and schedule for that
>> specific microarchitecture. That implies using atomics that work best for
>> that core rather than outlining them.
>
> I think we want to go ahead with this framework to enable the portable
> deployment of L
Hi Christophe,
Can you explain this in more detail - it doesn't make sense to me to force the
Thumb bit during unwinding since it should already be correct, even on a
Thumb-only CPU. Perhaps the kernel code that pushes an incorrect address on
the stack could be fixed instead?
> Without this, when
Hi Richard,
> The issue with the bugzilla is that it lacked appropriate testcase(s) and thus
> it is now a mess. There are clear testcases (maybe not in the benchmarks you
Agreed - it's not clear whether any of the proposed changes would actually
help the original issue. My patch absolutely does
Hi Kyrill,
>> + (mult:SI (match_operand:SI 3 "s_register_operand" "r")
>> + (match_operand:SI 2 "s_register_operand" "r"]
>
> Looks like we'll want to mark operand 2 here with '%' as well?
That doesn't make any difference since both operands are identical.
It only h
Hi Kyrill,
> We should be able to "compress" the above 3 patterns into one using code
> iterators.
Good point, that makes sense. I've committed this:
ChangeLog:
2019-09-18 Wilco Dijkstra
PR target/91738
* config/arm/arm.md (di3): Expand explicitly.
401 - 500 of 1186 matches
Mail list logo