Hi Richard,
Thanks for the review, now committed.
> The new aarch64_split_compare_and_swap code looks a bit twisty.
> The approach in lse.S seems more obvious. But I'm guessing you
> didn't want to spend any time restructuring the pre-LSE
> -mno-outline-atomics code, and I agree the patch in its
Hi Richard,
> + rtx load[max_ops], store[max_ops];
>
> Please either add a comment explaining why 40 is guaranteed to be
> enough, or (my preference) use:
>
> auto_vec, ...> ops;
I've changed to using auto_vec since that should help reduce conflicts
with Alex' LDP changes. I double-checked maxi
Hi Richard,
>> Enable lock-free 128-bit atomics on AArch64. This is backwards compatible
>> with
>> existing binaries, gives better performance than locking atomics and is what
>> most users expect.
>
> Please add a justification for why it's backwards compatible, rather
> than just stating that
Hi Kyrill,
> + /* Reduce the maximum size with -Os. */
> + if (optimize_function_for_size_p (cfun))
> + max_set_size = 96;
> +
> This is a new "magic" number in this code. It looks sensible, but how
> did you arrive at it?
We need 1 instruction to create the value to store (DUP or MO
Hi Kyrill,
> + if (!(hwcap & HWCAP_CPUID))
> + return false;
> +
> + unsigned long midr;
> + asm volatile ("mrs %0, midr_el1" : "=r" (midr));
> From what I recall that midr_el1 register is emulated by the kernel and so
> userspace software
> has to check that the kernel supports that emula
Hi,
>>> I checked codesize on SPECINT2017, and 96 had practically identical size.
>>> Using 128 would also be a reasonable Os value with a very slight size
>>> increase,
>>> and 384 looks good for O2 - however I didn't want to tune these values
>>> as this
>>> is a cleanup patch.
>>>
>>> Cheers,
>
Hi Richard,
> +/* Maximum bytes set for an inline memset expansion. With -Os use 3 STP
> + and 1 MOVI/DUP (same size as a call). */
> +#define MAX_SET_SIZE(speed) (speed ? 256 : 96)
> So it looks like this assumes we have AdvSIMD. What about
> -mgeneral-regs-only?
After my strictalign bugf
Hi Richard,
>> Note that aarch64_internal_mov_immediate may be called after reload,
>> so it would end up even more complex.
>
> The sequence I quoted was supposed to work before and after reload. The:
>
> rtx tmp = aarch64_target_reg (dest, DImode);
>
> would create a fresh tempor
The cpymemdi/setmemdi implementation doesn't fully support strict alignment.
Block the expansion if the alignment is less than 16 with STRICT_ALIGNMENT.
Clean up the condition when to use MOPS.
Passes regress/bootstrap, OK for commit?
gcc/ChangeLog/
PR target/103100
* con
A MOPS memmove may corrupt registers since there is no copy of the input
operands to temporary registers. Fix this by calling
aarch64_expand_cpymem_mops.
Passes regress/bootstrap, OK for commit?
gcc/ChangeLog/
PR target/21
* config/aarch64/aarch64.md (aarc
Hi Richard,
> * config/aarch64/aarch64.md (cpymemdi): Remove pattern condition.
> Shouldn't this be a separate patch? It's not immediately obvious that this
> is a necessary part of this change.
You mean this?
@@ -1627,7 +1627,7 @@ (define_expand "cpymemdi"
(match_operand:BLK 1 "m
v2: Use UINTVAL, rename max_mops_size.
The cpymemdi/setmemdi implementation doesn't fully support strict alignment.
Block the expansion if the alignment is less than 16 with STRICT_ALIGNMENT.
Clean up the condition when to use MOPS.
Passes regress/bootstrap, OK for commit?
gcc/ChangeLog/
Add support for inline memmove expansions. The generated code is identical
as for memcpy, except that all loads are emitted before stores rather than
being interleaved. The maximum size is 256 bytes which requires at most 16
registers.
Passes regress/bootstrap, OK for commit?
gcc/ChangeLog
Hi Ramana,
>> __sync_val_compare_and_swap may be used on 128-bit types and either calls the
>> outline atomic code or uses an inline loop. On AArch64 LDXP is only atomic
>> if
>> the value is stored successfully using STXP, but the current implementations
>> do not perform the store if the compa
The outline atomic functions have hidden visibility and can only be called
directly. Therefore we can remove the BTI at function entry. This improves
security by reducing the number of indirect entry points in a binary.
The BTI markings on the objects are still emitted.
Passes regress, OK for c
Hi Ramana,
> Hope this helps.
Yes definitely!
>> Passes regress/bootstrap, OK for commit?
>
> Target ? armhf ? --with-arch , -with-fpu , -with-float parameters ?
> Please be specific.
I used --target=arm-none-linux-gnueabihf --host=arm-none-linux-gnueabihf
--build=arm-none-linux-gnueabihf --wit
OK to backport to GCC13 (it applies cleanly and regress/bootstrap passes)?
Cheers,
Wilco
On 29/11/2023 18:09, Richard Sandiford wrote:
> Wilco Dijkstra writes:
>> v2: Use UINTVAL, rename max_mops_size.
>>
>> The cpymemdi/setmemdi implementation doesn't fully support
Hi Richard,
> Doing just this will mean that the register allocator will have to undo a
> pre/post memory operand that was accepted by the predicate (memory_operand).
> I think we really need a tighter predicate (lets call it noautoinc_mem_op)
> here to avoid that. Note that the existing uses
Hi Richard,
> The Linaro CI is reporting an ICE while building libgfortran with this change.
So it looks like Thumb-2 oddly enough restricts the negative range of DFmode
eventhough that is unnecessary and inefficient. The easiest workaround turned
out to avoid using checked adjust_address.
Cheer
Use UZP1 instead of INS when combining low and high halves of vectors.
UZP1 has 3 operands which improves register allocation, and is faster on
some microarchitectures.
Passes regress & bootstrap, OK for commit?
gcc:
* config/aarch64/aarch64-simd.md (aarch64_combine_internal):
Use
Use LDP/STP for large struct types as they have useful immediate offsets and
are typically faster.
This removes differences between little and big endian and allows use of
LDP/STP without UNSPEC.
Passes regress and bootstrap, OK for commit?
gcc:
* config/aarch64/aarch64.cc (aarch64_clas
Use LDP/STP for large struct types as they have useful immediate offsets and
are typically faster.
This removes differences between little and big endian and allows use of
LDP/STP without UNSPEC.
Passes regress and bootstrap, OK for commit?
gcc:
* config/aarch64/aarch64.cc (aarch64_clas
Add missing '\' in 2-instruction movsi/di alternatives so that they are
printed on separate lines.
Passes bootstrap and regress, OK for commit once stage 1 reopens?
gcc:
* config/aarch64/aarch64.md (movsi_aarch64): Use '\;' to force
newline in 2-instruction pattern.
(movdi
Improve costing of ctz - both TARGET_CSSC and vector cases were not handled yet.
Passes regress & bootstrap - OK for commit?
gcc:
* config/aarch64/aarch64.cc (aarch64_rtx_costs): Improve CTZ costing.
---
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index
f
Hi Andrew,
> I should note popcount has a similar issue which I hope to fix next week.
> Popcount cost is used during expand so it is very useful to be slightly more
> correct.
It's useful to set the cost so that all of the special cases still apply - even
if popcount is
relatively fast, it's s
Hi Andrew,
A few comments on the implementation, I think it can be simplified a lot:
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -700,8 +700,9 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE =
> AARCH64_FL_SM_OFF;
> #define DWARF2_UNWIND_INFO 1
>
> /* Use R0 through R3 to pass exception handling
Improve check-function-bodies by allowing single-character function names.
Also skip '#' comments which may be emitted from inline assembler.
Passes regress, OK for commit?
gcc/testsuite:
* lib/scanasm.exp (configure_check-function-bodies): Allow single-char
function names. Skip
Add __ARM_FEATURE_MOPS predefine. Add support for ACLE __arm_mops_memset_tag.
Passes regress, OK for commit?
gcc:
* config/aaarch64/aarch64-c.cc (aarch64_update_cpp_builtins):
Add __ARM_FEATURE_MOPS predefine.
* config/aarch64/arm_acle.h: Add __arm_mops_memset_tag().
gc
Hi Richard,
> I think this should be in a push_options/pop_options block, as for other
> intrinsics that require certain features.
But then the intrinsic would always be defined, which is contrary to what the
ACLE spec demands - it would not give a compilation error at the callsite
but give assem
The valid offset range of LDRD in arm_legitimate_index_p is increased to
-1024..1020 if NEON is enabled since VALID_NEON_DREG_MODE includes DImode.
Fix this by moving the LDRD check earlier.
Passes bootstrap & regress, OK for commit?
gcc:
PR target/115153
* config/arm/arm.cc (arm
A Thumb-1 memory operand allows single-register LDMIA/STMIA. This doesn't get
printed as LDR/STR with writeback in unified syntax, resulting in strange
assembler errors if writeback is selected. To work around this, use the 'Uw'
constraint that blocks writeback.
Passes bootstrap & regress, OK for
Fix CPU features initialization. Use HWCAP rather than explicit accesses
to CPUID registers. Perform the initialization atomically to avoid multi-
threading issues.
Passes regress, OK for commit and backport?
libgcc:
PR target/115342
* config/aarch64/cpuinfo.c (__init_cpu_featu
Hi Richard,
I've reworded the commit message a bit:
The CPU features initialization code uses CPUID registers (rather than
HWCAP). The equality comparisons it uses are incorrect: for example FEAT_SVE
is not set if SVE2 is available. Using HWCAPs for these is both simpler and
correct. The initi
Hi Richard,
>> Essentially anything covered by HWCAP doesn't need an explicit check. So I
>> kept
>> the LS64 and PREDRES checks since they don't have a HWCAP allocated (I'm not
>> entirely convinced we need these, let alone having 3 individual bits for
>> LS64, but
>> that's something for the A
Hi Christophe,
> PR target/115153
I guess this is typo (should be 115188) ?
Correct.
> +/* { dg-options "-O2 -mthumb" } */-mthumb is included in arm_arch_v6m, so I
> think you don't need to add it
here?
Indeed, it's not strictly necessary. Fixed in v2:
A Thumb-1 memory operand allows
v2: use a new arm_arch_v7ve_neon, fix use of DImode in output_move_neon
The valid offset range of LDRD in arm_legitimate_index_p is increased to
-1024..1020 if NEON is enabled since VALID_NEON_DREG_MODE includes DImode.
Fix this by moving the LDRD check earlier.
Passes bootstrap & regress, OK for
According to documentation, '^' should only have an effect during reload.
However ira-costs.cc treats it in the same way as '?' during early costing.
As a result using '^' can accidentally disable valid alternatives and cause
significant regressions (see PR114741). Avoid this by ignoring '^' duri
James Greenhalgh wrote:
> If we don't have any targets which care about the fccmps/fccmpd split in
> the code base, do we really need it? Can we just follow the example of
> fcsel?
If we do that then we should also change fcmps/d to fcmp to keep the f(c)cmp
attributes orthogonal. However it seems
On 12/16/2015 03:30 PM, Evandro Menezes wrote:
>
> On 10/30/2015 05:24 AM, Marcus Shawcroft wrote:
>
> On 20 October 2015 at 00:40, Evandro Menezes
>wrote:
>
> In the existing targets, it seems that it's always faster to zero
>up a DF
>
> register with "movi %d0,
cno FP_REGS
a1 (r79,l0) best FP_REGS, allocno FP_REGS
As a result it is now no longer a requirement to use register move costs that
are larger than the memory move cost. So it will be feasible to use realistic
costs for both without a huge penalty.
ChangeLog:
2016-01-22 Wilco Dijk
Richard Henderson wrote:
> On 01/25/2016 05:28 AM, Christophe Lyon wrote:
> > After this, I'm seeing this test now FAILs:
> > gcc.target/aarch64/ccmp_1.c scan-assembler adds\t
>
> That test case is badly written. In addition to that one, several of the
> other
> failures that I see within that fi
;wzr' on cmp - BTW is there a
regular expression that correctly implements (0|xzr)? If I use that the test
still fails somehow but \[0wzr\]+ works fine... Is the correct syntax
documented somewhere?
Finally to ensure FCCMPE is emitted on relational compares, add
-ffinite-math-only.
Cha
ping
From: Wilco Dijkstra
Sent: 16 December 2015 11:37
To: Richard Biener; James Greenhalgh
Cc: GCC Patches; nd
Subject: RE: [PATCH][AArch64] Add vector permute cost
Richard Biener wrote:
> On Wed, Dec 16, 2015 at 10:32 AM, James Greenhalgh
>
ping (note the regressions discussed below are addressed by
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01761.html)
From: Wilco Dijkstra
Sent: 17 December 2015 13:37
To: James Greenhalgh
Cc: gcc-patches@gcc.gnu.org; nd
Subject: RE: [PATCH][AArch64] Add
ping
> -Original Message-
> From: Wilco Dijkstra [mailto:wilco.dijks...@arm.com]
> Sent: 19 November 2015 18:12
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH][ARM] Enable fusion of AES instructions
>
> Enable instruction fusion of AES instructions on ARM for Cor
I've added myself to the "Write After Approval" maintainers (Committed revision
232880):
Index: ChangeLog
===
--- ChangeLog (revision 232874)
+++ ChangeLog (working copy)
@@ -1,3 +1,7 @@
+2015-01-27
James Greenhalgh wrote:
> I'm still seeing:
>
> FAIL: gcc.target/aarch64/ccmp_1.c scan-assembler-times \\tcmp\\tw[0-9]+,
> (0|wzr) 4
That's because "(0|wzr)" is not correctly matching due to the weird regular
expression syntax used in the testsuite (I tried with several escapes to no
avail). I
Fix the ccmp_1.c test back to use '0' as regular expressions don't work
correctly. '0' is right due to compare with zero now printing as 'CMP w0, 0'
rather than 'CMP w0, wzr' (since r232921).
Committed as trivial patch in r233102.
ChangeLog
don't see why the backend should expand tree expressions,
especially when they are not part of the CCMP sequence.
OK for commit?
ChangeLog:
2016-02-03 Wilco Dijkstra
gcc/
PR target/69619
* ccmp.c (expand_ccmp_expr_1): Avoid evaluating gs0/gs1
twice when co
4.
Adding the return fixes the regressions.
Committed as trivial in revision 233490.
2016-02-17 Wilco Dijkstra
gcc/
* config/aarch64/aarch64.c (aarch64_internal_mov_immediate):
Add missing return.
--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
Evandro Menezes wrote:
>
> I have a question though: is it necessary to add the "fp" and "simd"
> attributes to both movsf_aarch64 and movdf_aarch64 as well?
You need at least the "simd" attribute, but providing "fp" as well is clearer
(in principle the TARGET_FLOAT check in the pattern condition
Evandro Menezes wrote:
>
> Please, verify the new "simd" and "fp" attributes for SF and DF.
Both movsf and movdf should be:
(set_attr "simd" "*,yes,*,*,*,*,*,*,*,*")
(set_attr "fp" "*,*,*,yes,yes,yes,yes,*,*,*")
Did you check that with -mcpu=generic+nosimd you get fmov s0, wzr?
In my version
Evandro Menezes wrote:
>
> The meaning of these attributes are not clear to me. Is there a
> reference somewhere about which insns are FP or SIMD or neither?
The meaning should be clear, "fp" is a floating point instruction, "simd" a
SIMD one
as defined in ARM-ARM.
> Indeed, I had to add the Y
the extra int<->FP
moves. Placing the
integer variant first in the shr pattern generates far more optimal spill code.
2015-07-27 Wilco Dijkstra
* gcc/config/aarch64/aarch64.md (aarch64_lshr_sisd_or_int_3):
Place integer variant first. (aarch64_ashr_sisd_or
ping
> -Original Message-
> From: Wilco Dijkstra [mailto:wdijk...@arm.com]
> Sent: 27 April 2015 14:37
> To: GCC Patches
> Subject: [PATCH][AArch64] Improve spill code - swap order in shl pattern
>
> Various instructions are supported as integer operations as well
d on AArch64.
OK for commit?
ChangeLog:
2015-09-25 Wilco Dijkstra
* gcc/config/aarch64/aarch64.md (add3):
Block early expansion into 2 add instructions.
(add3_pluslong): New pattern to combine complex
immediates into 2 additions.
---
gcc/config/aarch64/aarch64.md
This patch improves support for instructions that allow FP zero immediate. All
FP compares generated
by various patterns should use aarch64_fp_compare_operand. LDP/STP uses
aarch64_reg_or_fp_zero.
Passes regression on AArch64.
OK for commit?
ChangeLog:
2015-10-08 Wilco Dijkstra
Enable instruction fusion of dependent AESE; AESMC and AESD; AESIMC pairs. This
can give up to 2x
speedup on many AArch64 implementations. Also model the crypto instructions on
Cortex-A57 according
to the Optimization Guide.
Passes regression tests.
ChangeLog:
2015-10-14 Wilco Dijkstra
Several instructions accidentally emit wzr/xzr even when the pattern specifies
an immediate. Fix
this by removing the register specifier in patterns that emit immediates.
Passes regression tests. OK for commit?
ChangeLog:
2015-10-28 Wilco Dijkstra
* gcc/config/aarch64/aarch64.md
of
the register. This results in better register allocation overall, fewer
spills and reduced codesize - particularly in SPEC2006 gamess.
GCC regression passes with several minor fixes.
OK for commit?
ChangeLog:
2015-11-06 Wilco Dijkstra
* gcc/config/aarch64/aarch64.c
reg_pref to illegal register classes so this kind
of issue can be trivially
found with an assert? Also would it not be a good idea to have a single
register copy function that
ensures all data is copied?
ChangeLog: 2014-12-09 Wilco Dijkstra wdijk...@arm.com
* gcc/ira-emit.c
compare with zero can be merged into an ALU
operation:
int
f (int a, int b)
{
a += b;
return a == 0 || a == 3;
}
f:
addsw0, w0, w1
ccmpw0, 3, 4, ne
csetw0, eq
ret
Passes GCC regression tests. OK for commit?
ChangeLog:
2015-11-13 Wilco Dijkstra
This patch adds support for FCCMP. This is trivial with the new CCMP
representation - remove the restriction of FP in ccmp.c and add FCCMP
patterns. Add a test to ensure FCCMP/FCCMPE are emitted as expected.
OK for commit?
ChangeLog:
2015-11-13 Wilco Dijkstra
* gcc/ccmp.c
This patch adds support for rtx costing of CCMP. The cost is the same as
int/FP compare, however comparisons with zero get a slightly larger cost.
This means we prefer emitting compares with zero so they can be merged with
ALU operations.
OK for commit?
ChangeLog:
2015-11-13 Wilco Dijkstra
This patch adds CCMP selection based on rtx costs. This is based on Jiong's
already approved patch
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01434.html with some minor
refactoring and the tests updated.
OK for commit?
ChangeLog:
2015-11-13 Jiong Wang
gcc/
* ccmp.c (expand_ccmp_exp
> Evandro Menezes wrote:
> Hi, Wilco.
>
> It looks good to me, but FCMP is quite different from FCCMP on Exynos M1,
> so it'd be helpful to have distinct types for them. Say, "fcmp{s,d}"
> and "fccmp{s,d}". Would it be acceptable to add this with this patch or
> later?
It would be easy to add f
Bernd Schmidt wrote:
> Sent: 17 November 2015 22:16
> To: Wilco Dijkstra; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH 1/4][AArch64] Generalize CCMP support
>
> On 11/13/2015 05:02 PM, Wilco Dijkstra wrote:
> > * gcc/ccmp.c (expand_ccmp_expr): Extract cmp_code from
(v2 version removes 4 enums)
This patch adds support for FCCMP. This is trivial with the new CCMP
representation - remove the restriction of FP in ccmp.c and add FCCMP
patterns. Add a test to ensure FCCMP/FCCMPE are emitted as expected.
OK for commit?
ChangeLog:
2015-11-18 Wilco Dijkstra
Jiong Wang
2015-11-18 Wilco Dijkstra
gcc/
* ccmp.c (expand_ccmp_expr_1): Cost the instruction sequences
generated from different expand order. Cleanup enum use.
gcc/testsuite/
* gcc.target/aarch64/ccmp_1.c: Update test.
---
gcc/ccmp.c
Enable instruction fusion of AES instructions on ARM for Cortex-A53 and
Cortex-A57.
OK for commit?
ChangeLog:
2015-11-20 Wilco Dijkstra
* gcc/config/arm/arm.c (arm_cortex_a53_tune): Add AES fusion.
(arm_cortex_a57_tune): Likewise.
(aarch_macro_fusion_pair_p): Add
ch
compares the previously set CC register. The then part does the compare like
a normal compare. The else part contains the integer value of the AArch64
condition that must be set if the if condition is false.
ChangeLog:
2015-11-12 Wilco Dijkstra
* gcc/target.def (gen_ccmp_fir
Yvan Roux wrote:
> I've a question regarding Cortex-A35, I don't see the same
> documentation for it on ARM website as we have for the other cores
> yet, but is AES fusion not beneficial for it or is it planned to do it
> later ?
It's early days for Cortex-A35, GCC 6 just has initial support. When
> James Greenhalgh wrote:
> > Could you please repost this with the word-wrapping issues fixed.
> > I can't apply it to my tree for review or to commit it on your behalf in
> > the current form.
So it looks like Outlook no longer supports sending emails without wrapping and
the
maximum is only
Fix PR93565 testcase for ILP32.
Committed as obvious.
testsuite/
* gcc.target/aarch64/pr93565.c: Fix test for ilp32.
--
diff --git a/gcc/testsuite/gcc.target/aarch64/pr93565.c
b/gcc/testsuite/gcc.target/aarch64/pr93565.c
index 7200f80..fb64f5c 100644
--- a/gcc/testsuite/gcc.target/aarch
Hi Modi,
> The zero extract now matching against other modes would generate a test +
> branch rather
> than the combined instruction which led to the code size regression. I've
> updated the patch
> so that tbnz etc. matches GPI and that brings code size down to <0.2% in
> spec2017 and <0.4% in
Hi Modi,
> The zero extract now matching against other modes would generate a test +
> branch rather
> than the combined instruction which led to the code size regression. I've
> updated the patch
> so that tbnz etc. matches GPI and that brings code size down to <0.2% in
> spec2017 and <0.4% in
The syntax for lane specifiers uses a vector element rather than a vector:
fmlsv0.2s, v1.2s, v1.s[1] // rather than v1.2s[2]
Fix all the lane specifiers to use Vetype which uses the correct element type.
Regress&bootstrap pass.
ChangeLog:
2020-03-06 Wilco Dijkstra
* aar
cores.
Fix this by adding new patterns and intrinsics for widening multiplies, which
results in a 63% speedup for the example in the PR. This fixes the performance
regression.
Passes regress&bootstrap.
ChangeLog:
2020-03-06 Wilco Dijkstra
PR target/91598
* config/aarch6
Hi Christophe,
> I noticed a regression introduced by Delia's patch "aarch64: ACLE
> intrinsics for BFCVTN, BFCVTN2 and BFCVT":
> (on aarch64-linux-gnu)
> FAIL: g++.dg/cpp0x/variadic-sizeof4.C -std=c++14 (internal compiler error)
>
> I couldn't reproduce it with current ToT, until I realized that
Hi,
There is no single PC offset that is correct given CPUs may use different
offsets.
GCC may also schedule the instruction that stores the PC. This feature used to
work on early Arms but is no longer functional or useful today, so the best way
forward is to remove it altogether. There are many
Hi Andrea,
I think the first part is fine when approved, but the 2nd part is problematic
like Szabolcs
already pointed out. We can't just change the ABI or semantics, and these
builtins are critical
for GLIBC performance. We would first need to change GLIBC back to using inline
assembler
so it
Hi Richard,
Thanks for these patches - yes TI mode expansions can certainly be improved!
So looking at your expansions for signed compares, why not copy the optimal
sequence from 32-bit Arm?
Any compare can be done in at most 2 instructions:
void doit(void);
void f(long long a)
{
if (a <= 1)
Hi Richard,
> Any compare can be done in at most 2 instructions:
>
> void doit(void);
> void f(long long a)
> {
> if (a <= 1)
> doit();
> }
>
> f:
> cmp r0, #2
> sbcs r3, r1, #0
> blt .L4
> Well, this one requires that you be able to add 1 to an in
Hi Duanbo,
> This is a simple fix for pr94577.
> The option -mabi=ilp32 should not be used in large code model. Like x86,
> using -mx32 and -mcmodel=large together will result in an error message.
> On aarch64, there is no error message for this option conflict.
> A solution to this problem can b
Any further comments? Note GCC doesn't support S/UMULLS either since it is
equally
useless. It's no surprise that Thumb-2 removed support for flag-setting 64-bit
multiplies,
while AArch64 didn't add flag-setting multiplies. So there is no argument that
these
instructions are in any way useful to
K, OK for commit?
ChangeLog:
2019-09-11 Wilco Dijkstra
* config/arm/arm.h (SLOW_BYTE_ACCESS): Set to 1.
--
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index
8b92c830de09a3ad49420fdfacde02d8efc2a89b..11212d988a0f56299c2266bace80170d074be56c
100644
--- a/gcc/config/arm/ar
s.
OK for commit until we get rid of it?
ChangeLog:
2017-11-17 Wilco Dijkstra
gcc/
* config/aarch64/aarch64.h (SLOW_BYTE_ACCESS): Set to 1.
--
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index
056110afb228fb919e837c04aa5e55
nces.
Bootstrapped on AArch64, passes regress, OK for commit?
ChangeLog:
2018-11-09 Wilco Dijkstra
gcc/
* config/aarch64/aarch64.c (aarch64_classify_symbol):
Apply reasonable limit to symbol offsets.
testsuite/
* gcc.ta
one-linux-gnueabihf --with-cpu=cortex-a57
ChangeLog:
2019-07-29 Wilco Dijkstra
* config/arm/arm.c (arm_option_override): Don't override sched
pressure algorithm.
--
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/
?
ChangeLog:
2019-09-09 Wilco Dijkstra
* config/arm/arm.h (HONOR_REG_ALLOC_ORDER): Set when optimizing for
size.
--
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index
8d023389eec469ad9c8a4e88edebdad5f3c23769..e3473e29fbbb964ff1136c226fbe30d35dbf7b39
100644
--- a/gcc
while SPECFP improves 0.2%.
Bootstrap OK, OK for commit?
ChangeLog:
2019-09-09 Wilco Dijkstra
* config/arm/arm.c (arm_legitimize_address): Remove Thumb-2 bailout.
--
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index
a5a6a0fab1b4b7ef07931522e7d47e59842
testcase - libquantum and SPECv6
performance improves.
OK for commit?
ChangeLog:
2018-01-22 Wilco Dijkstra
PR target/79262
* config/aarch64/aarch64.c (generic_vector_cost): Adjust
vec_to_scalar_cost.
--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
Hi Ramana,
> My only question would be whether it's more suitable to use
> optimize_function_for_size_p(cfun) instead as IIRC that gives us a
> chance with lto rather than the global optimize_size.
Yes that is even better and that defaults to optimize_size if cfun isn't
set. I've committed this:
Hi Ramana,
>On Mon, Sep 9, 2019 at 6:03 PM Wilco Dijkstra wrote:
>>
>> Currently arm_legitimize_address doesn't handle Thumb-2 at all, resulting in
>> inefficient code. Since Thumb-2 supports similar address offsets use the Arm
>> legitimization code for Thumb-2
Hi Ramana,
> Can you see what happens with the Cortex-A8 or Cortex-A9 schedulers to
> spread the range across some v7-a CPUs as well ? While they aren't that
> popular today I
> would suggest you look at them because the defaults for v7-a are still to use
> the
> Cortex-A8 scheduler and the Cor
Hi Richard,
> If global_char really is a char then isn't that UB?
No why? We can do all kinds of arithmetic based on pointers, either using
pointer types or converted to uintptr_t. Note that the optimizer actually
creates
these expressions, for example arr[N-x] can be evaluated as (&arr[0] + N
Hi,
> the defaults for v7-a are still to use the
> Cortex-A8 scheduler
I missed that part, but that's a serious bug btw - Cortex-A8 is 15 years old
now so
way beyond obsolete. Even Cortex-A53 is ancient now, but it has an accurate
scheduler
that performs surprisingly well on both in-order and
Hi Richard,
>> No - the testcases fail with that.
>
> Hmm, OK. Could you give more details? What does the motivating case
> actually look like?
Well it's now a very long time ago since I first posted this patch but the
failure
was in SPEC. It did something like &array[0xff000 - x], presuma
Hi Richard,
> Sure, the "extern array of unknown size" case isn't about section anchors.
> But this part of my message (snipped above) was about the other case
> (objects of known size), and applied to individual objects as well as
> section anchors.
>
> What I was trying to say is: yes, we need b
Hi Christophe,
> I've noticed that your patch caused a regression:
> FAIL: gcc.dg/tree-prof/pr77698.c scan-rtl-dump-times alignments
> "internal loop alignment added" 1
That's just a testism - it only tests for loop alignment and doesn't
consider the possibility of the loop being jumped into like
101 - 200 of 1186 matches
Mail list logo