Alexander Monakov writes:
> On Fri, 23 Dec 2022, Jose E. Marchesi wrote:
>
>> > +1 for trying this FWIW. There's still plenty of time to try an
>> > alternative solution if there are unexpected performance problems.
>>
>> Let me see if Alexander's patch fixes the issue at hand (it must) and
>> w
Prathamesh Kulkarni writes:
> Hi Richard,
> For the following (contrived) test:
>
> void foo(int32x4_t v)
> {
> v[3] = 0;
> return v;
> }
>
> -O2 code-gen:
> foo:
> fmovs1, wzr
> ins v0.s[3], v1.s[0]
> ret
>
> I suppose we can instead emit the following code-gen
Lulu Cheng writes:
> Co-authored-by: Yang Yujie
>
> gcc/ChangeLog:
>
> * config/loongarch/loongarch.cc (loongarch_classify_address):
> Add precessint for CONST_INT.
> (loongarch_print_operand_reloc): Operand modifier 'c' is supported.
> (loongarch_print_operand): Increase
lehua.d...@rivai.ai writes:
> From: Lehua Ding
>
> ps: Resend for adjusting the width of each line of text.
>
> Hi,
>
> When I was adding the new RISC-V auto-vectorization function, I found that
> converting `vector-reg1 vop vector-vreg2` to `scalar-reg3 vop vectorreg2`
> is not very easy to handl
Wilco Dijkstra writes:
> Hi,
>
>> @Wilco, can you please send the rebased patch for patch review? We would
>> need in out openSUSE package soon.
>
> Here is an updated and rebased version:
>
> Cheers,
> Wilco
>
> v4: rebase and add REG_UNSAVED_ARCHEXT.
>
> A recent change only initializes the regs
"丁乐华" writes:
> > I don't think this pattern is correct, because SEL isn't commutative
> > in the vector operands.
>
> Indeed, I think I should invert PRED operand or the comparison
> operator which produce the PRED operand first.
That would work, but it would no longer be a win. The vectoriser
Lulu Cheng writes:
> Co-authored-by: Yang Yujie
>
> gcc/ChangeLog:
>
> * config/loongarch/loongarch.cc (loongarch_classify_address):
> Add precessint for CONST_INT.
> (loongarch_print_operand_reloc): Operand modifier 'c' is supported.
> (loongarch_print_operand): Increase
Prathamesh Kulkarni writes:
> On Tue, 17 Jan 2023 at 18:29, Richard Sandiford
> wrote:
>>
>> Prathamesh Kulkarni writes:
>> > Hi Richard,
>> > For the following (contrived) test:
>> >
>> > void foo(int32x4_t v)
>> > {
>> > v[3] = 0;
>> > return v;
>> > }
>> >
>> > -O2 code-gen:
>> > foo:
>>
Prathamesh Kulkarni writes:
> Hi Richard,
> Based on your suggestion in the other thread, the patch uses
> exact_log2 (INTVAL (operands[2])) >= 0 to gate for vec_merge patterns.
> Bootstrap+test in progress on aarch64-linux-gnu.
> Does it look OK ?
Yeah, this is OK, thanks. IMO it's a latent bug
Christophe Lyon writes:
> The previous patch added an assert which should not be applied to PST
> types (Pure Scalable Types) because alignment does not matter in this
> case. This patch moves the assert after the PST case is handled to
> avoid the ICE.
>
> PR target/108411
> gcc/
>
Christophe Lyon writes:
> As discussed in the PR, these recently added tests fail when the
> testsuite is executed with -fstack-protector-strong. To avoid this,
> this patch adds -fno-stack-protector to dg-options.
>
> PR target/108411
> gcc/testsuite
> * g++.target/aarch64/bitf
Szabolcs Nagy writes:
> The expected way to handle eh_return is to pass the stack adjustment
> offset and landing pad address via
>
> EH_RETURN_STACKADJ_RTX
> EH_RETURN_HANDLER_RTX
>
> to the epilogue that is shared between normal return paths and the
> eh_return paths. EH_RETURN_HANDLER_RTX
Wilco Dijkstra writes:
> A MOPS memmove may corrupt registers since there is no copy of the input
> operands to temporary
> registers. Fix this by calling aarch64_expand_cpymem which does this. Also
> fix an issue with
> STRICT_ALIGNMENT being ignored if TARGET_MOPS is true, and avoid crashing
Richard Earnshaw via Gcc-patches writes:
> Now that we require C++ 11, we can safely forward declare rtx_code
> so that we can use it in target hooks.
>
> gcc/ChangeLog
> * coretypes.h (rtx_code): Add forward declaration.
> * rtl.h (rtx_code): Make compatible with forward declaration.
Richard Earnshaw via Gcc-patches writes:
> Note, this patch is dependent on the patch I posted yesterday to
> forward declare rtx_code in coretypes.h.
>
> --
> Now that we have a forward declaration of rtx_code in coretypes.h, we
> can adjust these hooks to take rtx_code arguments rather than
"Richard Earnshaw (lists)" writes:
> On 23/08/2023 16:49, Richard Sandiford via Gcc-patches wrote:
>> Richard Earnshaw via Gcc-patches writes:
>>> Now that we require C++ 11, we can safely forward declare rtx_code
>>> so that we can use it i
Wilco Dijkstra writes:
> Hi Richard,
>
> (that's quick!)
>
>> + if (size > max_copy_size || size > max_mops_size)
>> +return aarch64_expand_cpymem_mops (operands, is_memmove);
>>
>> Could you explain this a bit more? If I've followed the logic correctly,
>> max_copy_size will always be 0 for
The scalar FNMADD/FNMSUB and SVE FNMLA/FNMLS instructions mean
that either side of a subtraction can start an accumulator chain.
However, Advanced SIMD doesn't have an equivalent instruction.
This means that, for Advanced SIMD, a subtraction can only be
fused if the second operand is a multiplicati
Richard Biener writes:
> The following adds the capability to do SLP on .MASK_STORE, I do not
> plan to add interleaving support.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
LGTM, thanks.
Richard
> Thanks,
> Richard.
>
> PR tree-optimization/15
> gcc/
> * tree-v
Richard Sandiford writes:
> Rather than hiding this in target code, perhaps we should add a
> target-independent concept of an "eh_return taken" flag, say
> EH_RETURN_TAKEN_RTX.
>
> We could define it so that, on targets that define EH_RETURN_TAKEN_RTX,
> a register EH_RETURN_STACKADJ_RTX and a re
Jeff Law writes:
> On 8/22/23 02:08, juzhe.zh...@rivai.ai wrote:
>> Yes, I agree long-term we want every-thing be optimized as early as
>> possible.
>>
>> However, IMHO, it's impossible we can support every conditional patterns
>> in the middle-end (match.pd).
>> It's a really big number.
>>
>
Juzhe-Zhong writes:
> Hi, Richard and Richi.
>
> Currently, GCC support COND_LEN_FMA for floating-point **NO** -ffast-math.
> It's supported in tree-ssa-math-opts.cc. However, GCC failed to support
> COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS.
>
> Consider this following case:
> #define TEST_TYPE(T
Just some off-the-cuff thoughts. Might think differently when
I've had more time...
Richard Biener writes:
> On Mon, 28 Aug 2023, Jakub Jelinek wrote:
>
>> Hi!
>>
>> While the _BitInt series isn't committed yet, I had a quick look at
>> lifting the current lowest limitation on maximum _BitInt p
Jeff Law writes:
> On 8/24/23 08:06, Robin Dapp via Gcc-patches wrote:
>> Ping. I refined the code and some comments a bit and added a test
>> case.
>>
>> My question in general would still be: Is this something we want
>> given that we potentially move some of combine's work a bit towards
>> t
excl_hash_traits can be defined more simply by reusing existing traits.
Tested on aarch64-linux-gnu. OK to install?
Richard
gcc/
* attribs.cc (excl_hash_traits): Delete.
(test_attribute_exclusions): Use pair_hash and nofree_string_hash
instead.
---
gcc/attribs.cc | 45
[Sorry for any weird MUA issues, don't have access to my usual set-up.]
> when looking at a riscv ICE in vect-live-6.c I noticed that we
> assume that the variable part (coeffs[1] * x1) of the to-be-extracted
> bit number in extract_bit_field_1 is a multiple of BITS_PER_UNIT.
>
> This means that b
Robin Dapp writes:
>> But in the VLA case, doesn't it instead have precision 4+4X?
>> The problem then is that we can't tell at compile time which
>> byte that corresponds to. So...
>
> Yes 4 + 4x. I keep getting confused with poly modes :)
> In this case we want to extract the bitnum [3 4] = 3
While working on another patch, I hit a problem with the aarch64
expansion of untyped_call. The expander emits the usual:
(set (mem ...) (reg resN))
instructions to store the result registers to memory, but it didn't
say in RTL where those resN results came from. This eventually led
to a fail
While backporting another patch to an earlier release, I hit a
situation in which lra_eliminate_regs_1 would eliminate an address to:
(plus (reg:P R) (const_int 0))
This address compared not-equal to plain:
(reg:P R)
which caused an ICE in a later peephole2. (The ICE showed up in
gfort
Robin Dapp via Gcc-patches writes:
>> It's not just a question of which byte though. It's also a question
>> of which bit.
>>
>> One option would be to code-generate for even X and for odd X, and select
>> between them at runtime. But that doesn't scale well to 2+2X and 1+1X.
>>
>> Otherwise I
Uros Bizjak via Gcc-patches writes:
> On Thu, Aug 31, 2023 at 11:18 AM Jakub Jelinek via Gcc-patches
> wrote:
>>
>> On Thu, Aug 31, 2023 at 04:20:17PM +0800, Hongyu Wang via Gcc-patches wrote:
>> > From: Kong Lingling
>> >
>> > In inline asm, we do not know if the insn can use EGPR, so disable E
Tamar Christina writes:
> Hi All,
>
> In GCC-9 our scalar xorsign pattern broke and we didn't notice it because the
> testcase was not strong enough. With this commit
>
> 8d2d39587d941a40f25ea0144cceb677df115040 is the first bad commit
> commit 8d2d39587d941a40f25ea0144cceb677df115040
> Author: S
Tamar Christina writes:
>> -Original Message-
>> From: Richard Sandiford
>> Sent: Friday, September 1, 2023 2:36 PM
>> To: Tamar Christina
>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> ; Marcus Shawcroft
>> ; Kyrylo Tkachov
>> Subject: Re: [PATCH]AArch64 xorsign: Fix scalar x
"yanzhang.wang--- via Gcc-patches" writes:
> From: Yanzhang Wang
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/sve/acle/asm/subr_s8.c: Modify subr with -1
> to not.
>
> Signed-off-by: Yanzhang Wang
> ---
>
> Tested on my local arm environment and passed. Thanks Andrew Pinski's
Christophe Lyon via Gcc-patches writes:
> Tests under gcc.dg/vect use check_vect_support_and_set_flags to set
> compilation flags as appropriate for the target, but they also set
> dg-do-what-default to 'run' or 'compile', depending on the actual
> target hardware (or simulator) capabilities.
>
>
Marc Poulhiès via Gcc-patches writes:
> Richard Sandiford via Gcc-patches writes:
>>> +# this regex matches the first line of the "end" in the initial commit
>>> message
>>> +FIRST_LINE_OF_END_RE = re.compile('(?i)^(signed-off-by|co-authored-by|#):
Thiago Jung Bauermann via Gcc-patches writes:
> Since commit e7a36e4715c7 "[PATCH] RISC-V: Support simplify (-1-x) for
> vector." these tests fail on aarch64-linux:
>
> === g++ tests ===
>
> Running g++:g++.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp ...
> FAIL: gcc.target/aarch
Richard Sandiford writes:
> "yanzhang.wang--- via Gcc-patches" writes:
>> From: Yanzhang Wang
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/aarch64/sve/acle/asm/subr_s8.c: Modify subr with -1
>> to not.
>>
>> Signed-off-by: Yanzhang Wang
>> ---
>>
>> Tested on my local arm environm
Qing Zhao via Gcc-patches writes:
>> On Aug 29, 2023, at 3:42 PM, Marek Polacek via Gcc-patches
>> wrote:
>>
>> Improving the security of software has been a major trend in the recent
>> years. Fortunately, GCC offers a wide variety of flags that enable extra
>> hardening. These flags aren't
Robin Dapp writes:
>> So I don't think I have a good feel for the advantages and disadvantages
>> of doing this. Robin's analysis of the aarch64 changes was nice and
>> detailed though. I think the one that worries me most is the addressing
>> mode one. fwprop is probably the first chance we ge
Thomas Schwinge writes:
> Hi!
>
> On 2023-09-04T23:05:05+0200, I wrote:
>> On 2019-07-16T15:04:49+0100, Richard Sandiford
>> wrote:
>>> This patch therefore adds a new check-function-bodies dg-final test
>
>>> The regexps in parse_function_bodies are fairly general, but might
>>> still need to b
Szabolcs Nagy writes:
> gcc/ChangeLog:
>
> * config/aarch64/aarch64.h (AARCH64_ISA_RCPC): Remove dup.
OK, thanks.
Richard
> ---
> gcc/config/aarch64/aarch64.h | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index 2b0fc
Szabolcs Nagy writes:
> EH returns no longer rely on clobbering the return address on the stack
> so forcing a stack frame is not necessary.
>
> This does not actually change the code gen for the unwinder since there
> are calls before the EH return.
>
> gcc/ChangeLog:
>
> * config/aarch64/a
Szabolcs Nagy writes:
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/eh_return-2.c: New test.
> * gcc.target/aarch64/eh_return-3.c: New test.
OK.
I wonder if it's worth using check-function-bodies for -3.c though.
It would then be easy to verify that the autiasp only occurs on t
Szabolcs Nagy writes:
> This is needed since eh_return no longer prevents pac-ret in the
> normal return path.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/return_address_sign_1.c: Move func4 to ...
> * gcc.target/aarch64/return_address_sign_2.c: ... here and fix the
> s
Szabolcs Nagy writes:
> The tests manipulate the return address in abitest-2.h and thus not
> compatible with -mbranch-protection=pac-ret+leaf or
> -mbranch-protection=gcs.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/aapcs64/func-ret-1.c: Disable branch-protection.
> * gcc.ta
Szabolcs Nagy writes:
> Update tests for the new branch-protection parser errors.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/branch-protection-attr.c: Update.
> * gcc.target/aarch64/branch-protection-option.c: Update.
OK, thanks. (And I agree these are better messages. :))
Yang Yujie writes:
> @@ -5171,25 +5213,21 @@ case "${target}" in
> # ${with_multilib_list} should not contain whitespaces,
> # consecutive commas or slashes.
> if echo "${with_multilib_list}" \
> - | grep -E -e "[[:space:]]" -e '[,/][,/]' -e '[
Yang Yujie writes:
> gcc/ChangeLog:
>
> * config.gcc: remove non-POSIX syntax "<<<".
OK. Thanks for the quick fix.
Richard.
> ---
> gcc/config.gcc | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index b2fe7c7ceef..6d4c8becd28 10
Robin Dapp writes:
> Hi Richard,
>
> I did some testing with the attached v2 that does not restrict to UNARY
> anymore. As feared ;) there is some more fallout that I'm detailing below.
>
> On Power there is one guality fail (pr43051-1.c) that I would take
> the liberty of ignoring for now.
>
> O
When I tried to use config-list.mk, the build for every triple except
the build machine's failed for m2. This is because, unlike other
languages, m2 builds target objects during all-gcc. The build will
therefore fail unless you have access to an appropriate binutils
(or an equivalent). That's qu
Robin Dapp writes:
> Hi Richard,
>
> I did some testing with the attached v2 that does not restrict to UNARY
> anymore. As feared ;) there is some more fallout that I'm detailing below.
>
> On Power there is one guality fail (pr43051-1.c) that I would take
> the liberty of ignoring for now.
>
> O
Lehua Ding writes:
> Hi,
>
> This patch adds support that tries to fold `MIN (poly, poly)` to
> a constant. Consider the following C Code:
>
> ```
> void foo2 (int* restrict a, int* restrict b, int n)
> {
> for (int i = 0; i < 3; i += 1)
> a[i] += b[i];
> }
> ```
>
> Before this patch:
>
Lehua Ding writes:
> Hi,
>
> This patch adds support that tries to fold `MIN (poly, poly)` to
> a constant. Consider the following C Code:
>
> ```
> void foo2 (int* restrict a, int* restrict b, int n)
> {
> for (int i = 0; i < 3; i += 1)
> a[i] += b[i];
> }
> ```
>
> Before this patch:
>
Richard Sandiford writes:
> Lehua Ding writes:
>> Hi,
>>
>> This patch adds support that tries to fold `MIN (poly, poly)` to
>> a constant. Consider the following C Code:
>>
>> ```
>> void foo2 (int* restrict a, int* restrict b, int n)
>> {
>> for (int i = 0; i < 3; i += 1)
>> a[i] += b
Lehua Ding writes:
> V3 change: Address Richard's comments.
>
> Hi,
>
> This patch adds support that tries to fold `MIN (poly, poly)` to
> a constant. Consider the following C Code:
>
> ```
> void foo2 (int* restrict a, int* restrict b, int n)
> {
> for (int i = 0; i < 3; i += 1)
> a[i]
Currently there are four static sources of attributes:
- LANG_HOOKS_ATTRIBUTE_TABLE
- LANG_HOOKS_COMMON_ATTRIBUTE_TABLE
- LANG_HOOKS_FORMAT_ATTRIBUTE_TABLE
- TARGET_ATTRIBUTE_TABLE
All of the attributes in these tables go in the "gnu" namespace.
This means that they can use the traditional GNU __
Jakub Jelinek writes:
> Hi!
>
> The recent pp_wide_int changes for _BitInt support (because not all
> wide_ints fit into the small fixed size digit_buffer anymore) apparently
> broke
> +FAIL: gcc.dg/analyzer/out-of-bounds-diagram-1-debug.c (test for excess
> errors)
> +FAIL: gcc.dg/analyzer/out-o
This series of patches fixes deficiencies in GCC's -fstack-protector
implementation for AArch64 when using dynamically allocated stack space.
This is CVE-2023-4039. See:
https://developer.arm.com/Arm%20Security%20Center/GCC%20Stack%20Protector%20Vulnerability%20AArch64
https://github.com/metaredt
aarch64_layout_frame uses a shorthand for referring to
cfun->machine->frame:
aarch64_frame &frame = cfun->machine->frame;
This patch does the same for some other heavy users of the structure.
No functional change intended.
gcc/
* config/aarch64/aarch64.cc (aarch64_save_callee_saves): U
Following on from the previous bytes_below_saved_regs patch, this one
records the number of bytes that are below the hard frame pointer.
This eventually replaces below_hard_fp_saved_regs_size.
If a frame pointer is not needed, the epilogue adds final_adjust
to the stack pointer before restoring re
After previous patches, it is no longer necessary to calculate
a chain_offset in cases where there is no chain record.
gcc/
* config/aarch64/aarch64.cc (aarch64_expand_prologue): Move the
calculation of chain_offset into the emit_frame_chain block.
---
gcc/config/aarch64/aarch64.c
After previous patches, it no longer really makes sense to allocate
the top of the frame in terms of varargs_and_saved_regs_size and
saved_regs_and_above.
gcc/
* config/aarch64/aarch64.cc (aarch64_layout_frame): Simplify
the allocation of the top of the frame.
---
gcc/config/aarch
When we emit the frame chain, i.e. when we reach Here in this statement
of aarch64_expand_prologue:
if (emit_frame_chain)
{
// Here
...
}
the stack is in one of two states:
- We've allocated up to the frame chain, but no more.
- We've allocated the whole frame, and the fra
aarch64_save_callee_saves and aarch64_restore_callee_saves took
a parameter called start_offset that gives the offset of the
bottom of the saved register area from the current stack pointer.
However, it's more convenient for later patches if we use the
bottom of the entire frame as the reference po
Similarly to the previous locals_offset patch, hard_fp_offset
was described as:
/* Offset from the base of the frame (incomming SP) to the
hard_frame_pointer. This value is always a multiple of
STACK_BOUNDARY. */
poly_int64 hard_fp_offset;
which again took an “upside-down” view: h
reg_offset was measured from the bottom of the saved register area.
This made perfect sense with the original layout, since the bottom
of the saved register area was also the hard frame pointer address.
It became slightly less obvious with SVE, since we save SVE
registers below the hard frame point
If a frame has no saved registers, it can be allocated in one go.
There is no need to treat the areas below and above the saved
registers as separate.
And if we allocate the frame in one go, it should be allocated
as the initial_adjust rather than the final_adjust. This allows the
frame size to g
This patch fixes another case in which a value was described with
an “upside-down” view.
gcc/
* config/aarch64/aarch64.h (aarch64_frame::frame_size): Tweak comment.
---
gcc/config/aarch64/aarch64.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/gcc/config/aarch64/
This patch just changes a calculation of initial_adjust
to one that makes it slightly more obvious that the total
adjustment is frame.frame_size.
gcc/
* config/aarch64/aarch64.cc (aarch64_layout_frame): Tweak
calculation of initial_adjust for frames in which all saves
are S
-fstack-clash-protection uses the save of LR as a probe for the next
allocation. The next allocation could be:
* another part of the static frame, e.g. when allocating SVE save slots
or outgoing arguments
* an alloca in the same function
* an allocation made by a callee function
However, whe
The frame layout code currently hard-codes the assumption that
the number of bytes below the saved registers is equal to the
size of the outgoing arguments. This patch abstracts that
value into a new field of aarch64_frame.
gcc/
* config/aarch64/aarch64.h (aarch64_frame::bytes_below_saved
The AArch64 ABI says that, when stack clash protection is used,
there can be a maximum of 1KiB of unprobed space at sp on entry
to a function. Therefore, we need to probe when allocating
>= guard_size - 1KiB of data (>= rather than >). This is what
GCC does.
If an allocation is exactly guard_siz
After previous patches, it's no longer necessary to store
saved_regs_size and below_hard_fp_saved_regs_size in the frame info.
All measurements instead use the top or bottom of the frame as
reference points.
gcc/
* config/aarch64/aarch64.h (aarch64_frame::saved_regs_size)
(aarch64_
locals_offset was described as:
/* Offset from the base of the frame (incomming SP) to the
top of the locals area. This value is always a multiple of
STACK_BOUNDARY. */
This is implicitly an “upside down” view of the frame: the incoming
SP is at offset 0, and anything N bytes below
Previous patches ensured that the final frame allocation only needs
a probe when the size is strictly greater than 1KiB. It's therefore
safe to use the normal 1024 probe offset in all cases.
The main motivation for doing this is to simplify the code and
remove the number of special cases.
gcc/
AArch64 normally puts the saved registers near the bottom of the frame,
immediately above any dynamic allocations. But this means that a
stack-smash attack on those dynamic allocations could overwrite the
saved registers without needing to reach as far as the stack smash
canary.
The same thing co
The stack frame is currently divided into three areas:
A: the area above the hard frame pointer
B: the SVE saves below the hard frame pointer
C: the outgoing arguments
If the stack frame is allocated in one chunk, the allocation needs a
probe if the frame size is >= guard_size - 1KiB. In additio
Wilco Dijkstra writes:
> List official cores first so that -cpu=native does not show a codename with -v
> or in errors/warnings.
Nice spot.
> Passes regress, OK for commit?
>
> gcc/ChangeLog:
> * config/aarch64/aarch64-cores.def (neoverse-n1): Place before ares.
> (neoverse-v1):
In the following test:
svuint8_t ld(uint8_t *ptr) { return svld1rq(svptrue_b8(), ptr + 2); }
ptr + 2 is a valid address for an Advanced SIMD load, but not for
an SVE load. We therefore ended up generating:
ldr q0, [x0, 2]
dup z0.q, z0.q[0]
This patch makes us generate
AArch64 previously costed WHILELO instructions on the first call
to add_stmt_cost. This was because, at the time, only add_stmt_cost
had access to the loop_vec_info.
However, after the AVX512 changes, we only calculate the masks later.
This patch moves the WHILELO costing to finish_cost, which is
aarch64_operands_ok_for_ldpstp contained the code:
/* One of the memory accesses must be a mempair operand.
If it is not the first one, they need to be swapped by the
peephole. */
if (!aarch64_mem_pair_operand (mem_1, GET_MODE (mem_1))
&& !aarch64_mem_pair_operand (mem_2, GET
Juzhe-Zhong writes:
> According to PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110751
>
> As Richard and Richi suggested, we recognize uninitialized SSA_NAME and
> convert it
> into SCRATCH rtx if the target predicate allows SCRATCH.
>
> It can help to reduce redundant data move instructions
Wilco Dijkstra writes:
> Support immediate expansion of immediates which can be created from 2 MOVKs
> and a shifted ORR or BIC instruction. Change aarch64_split_dimode_const_store
> to apply if we save one instruction.
>
> This reduces the number of 4-instruction immediates in SPECINT/FP by 5%.
Prathamesh Kulkarni writes:
> Hi,
> After 27de9aa152141e7f3ee66372647d0f2cd94c4b90, there's a following
> regression:
> FAIL: gcc.target/aarch64/vect_copy_lane_1.c scan-assembler-times
> ins\\tv0.s\\[1\\], v1.s\\[0\\] 3
>
> This happens because for the following function from vect_copy_lane_1.c:
Juzhe-Zhong writes:
> According to PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110751
>
> As Richard and Richi suggested, we recognize uninitialized SSA_NAME and
> convert it
> into SCRATCH rtx if the target predicate allows SCRATCH.
>
> It can help to reduce redundant data move instructions
Kewen Lin writes:
> This costing adjustment patch series exposes one issue in
> aarch64 specific costing adjustment for STP sequence. It
> causes the below test cases to fail:
>
> - gcc/testsuite/gcc.target/aarch64/ldp_stp_15.c
> - gcc/testsuite/gcc.target/aarch64/ldp_stp_16.c
> - gcc/tests
Richard Biener writes:
>> @@ -812,33 +997,80 @@ split_constant_offset_1 (tree type, tree op0, enum
>> tree_code code, tree op1,
>> }
>> }
>>
>> -/* Expresses EXP as VAR + OFF, where off is a constant. The type of OFF
>> - will be ssizetype. */
>> +/* If EXP has pointer type, try to ex
Christophe Lyon writes:
> On Wed, 9 Dec 2020 at 17:47, Richard Sandiford
> wrote:
>>
>> Christophe Lyon via Gcc-patches writes:
>> > Hi,
>> >
>> > I've been working for a while on enabling auto-vectorization for ARM
>> > MVE, and I find it a bit awkward to keep things common with Neon as
>> > mu
Richard Biener writes:
> Pattern recog incompletely handles some bool cases but we shouldn't
> miscompile as a result but not vectorize. Unfortunately
> vectorizable_assignment lets invalid conversions (that
> vectorizable_conversion rejects) slip through. The following
> rectifies that.
>
> Boo
Richard Biener writes:
> On Fri, 11 Dec 2020, Richard Sandiford wrote:
>> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
>> > index a4980a931a9..d3ab8aa1c29 100644
>> > --- a/gcc/tree-vect-stmts.c
>> > +++ b/gcc/tree-vect-stmts.c
>> > @@ -5123,6 +5123,17 @@ vectorizable_assignment (v
Richard Biener writes:
> On Fri, 11 Dec 2020, Richard Sandiford wrote:
>
>> Richard Biener writes:
>> > On Fri, 11 Dec 2020, Richard Sandiford wrote:
>> >> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
>> >> > index a4980a931a9..d3ab8aa1c29 100644
>> >> > --- a/gcc/tree-vect-stmts.
Przemyslaw Wirkus writes:
> This patch adds support for -mcpu=cortex-a78c command line option.
> For more information about this processor, see [0]:
>
> [0] https://developer.arm.com/ip-products/processors/cortex-a/cortex-a78c
>
> OK for master ?
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch
Przemyslaw Wirkus writes:
> Hi,
>
> Recent 'support SVE comparisons for unpacked integers' patch extends
> operands of define_expands from SVE_FULL to SVE_ALL. This causes an ICE
> hence this PR patch.
>
> This patch adds this relaxation for:
> + reduc__scal_ and
> + arch64_pred_reduc__
> in order
Rearranging slightly…
> @@ -708,6 +713,10 @@ (define_c_enum "unspec"
> UNSPEC_FCMLA90 ; Used in aarch64-simd.md.
> UNSPEC_FCMLA180 ; Used in aarch64-simd.md.
> UNSPEC_FCMLA270 ; Used in aarch64-simd.md.
> +UNSPEC_FCMUL ; Used in aarch64-simd.md.
> +UNSPEC_FCMUL180 ;
Tamar Christina writes:
> Hi Richard,
>
> Do you object to me splitting off complex add and addressing your remaining
> feedback later when the rewrite of mul and fma are done.
No, sounds good to me.
Thanks,
Richard
Jeff Law writes:
> On 11/13/20 1:20 AM, Richard Sandiford via Gcc-patches wrote:
>> This patch adds some classes for gathering the list of registers
>> and memory that are read and written by an instruction, along
>> with various properties about the accesses. In some ways i
Tamar Christina writes:
> Hi Richard,
>
> Here's the split off complex add.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> Checked with armv8-a+sve2+fp16 and no issues. Note that due to a mid-end
> limitation SLP for SVE currently fails for some permutes. The tests have
>
Nathan Sidwell writes:
> Apparently 'var+=...' is not a dash thing. Fixed thusly.
>
> * config.m4: Avoid non-dash idiom
> * configure: Rebuilt.
>
> pushed (2 patches, because I didn't look carefully enough the first time)
Thanks. I think the other uses of += need the same treatme
"Maciej W. Rozycki" writes:
> On Tue, 15 Dec 2020, Jeff Law wrote:
>
>> > @@ -1942,7 +1942,7 @@ gen_divdf3_cc (rtx operand0 ATTRIBUTE_UN
>> >gen_rtx_DIV (DFmode,
>> >operand1,
>> >operand2),
>> > - const0_rtx)),
>> > + CONST_DOUBLE_ATOF ("0", VOIDmode))),
>> >gen_rtx_SET
1401 - 1500 of 2183 matches
Mail list logo