Przemyslaw Wirkus writes:
> > This is a bug in the vectoriser: the vectoriser shouldn't generate
> > IFN_REDUC_MAX calls that the target doesn't support.
> >
> > I think the problem comes from using the wrong interface to get the index
> > type for a COND_REDUCTION. vectorizable_reduction has:
>
Martin Sebor writes:
> On 11/26/20 10:06 AM, Richard Sandiford wrote:
>> Martin Sebor writes:
>>> I do have one concern: the tendency to prioritize efficiency
>>> over safety (this can be said about most GCC code). Specifically
>>> in this class, the address bit twiddling makes me uneasy. I don'
Jeff Law via Gcc-patches writes:
> On 11/13/20 1:19 AM, Richard Sandiford via Gcc-patches wrote:
>> In some cases, it can be convenient to roll back the changes that
>> have been made by validate_change to see how things looked before,
>> then reroll the changes. For e
Jeff Law writes:
> On 11/13/20 1:15 AM, Richard Sandiford via Gcc-patches wrote:
>> We already have two splay tree implementations: the old C one in
>> libiberty and a templated reimplementation of it in typed-splay-tree.h.
>> However, they have some drawbacks:
>&
Jeff Law via Gcc-patches writes:
> On 11/13/20 1:23 AM, Richard Sandiford via Gcc-patches wrote:
>> This patch adds the RTL SSA infrastructure itself. The following
>> fwprop.c patch will make use of it.
>>
>> gcc/
>> * configure.ac: Add rtl-ssa to t
Jeff Law via Gcc-patches writes:
> On 11/13/20 1:24 AM, Richard Sandiford via Gcc-patches wrote:
>> This patch rewrites fwprop.c to use the RTL SSA framework. It tries
>> as far as possible to mimic the old behaviour, even in caes where
>> that doesn't fit naturally wit
Jeff Law writes:
> On 11/13/20 1:21 AM, Richard Sandiford via Gcc-patches wrote:
>> This patch adds a routine for finding a “simple” SET for a register
>> definition. See the comment in the patch for details.
>>
>> gcc/
>> * rtl.h (simple_regno_s
Rainer Orth writes:
> Hi Kyryll,
>
>>> Fixed by moving the memmove.h include in rtl-ssa.h before tm_p.h.
>>>
>>> Tested on sparc-sun-solaris2.11 and i386-pc-solaris2.11 (both into stage
>>> 3 now, so the compilation error is gone).
>>>
>>> Ok for master?
>>
>> AFAIK simple patches like this that
Kyrylo Tkachov via Gcc-patches writes:
> Hi all,
>
> While experimenting with some backend costs for Advanced SIMD and SVE I hit
> many cases where GCC would pick SVE for VLA auto-vectorisation even when the
> backend very clearly presented cheaper costs for Advanced SIMD.
> For a simple float add
Tom Tromey writes:
>>>>>> "Richard" == Richard Sandiford via Gcc-patches
>>>>>> writes:
>
> Richard> +// A class that stores a choice "A or B", where A has type T1 * and
> B has
> Richard> +// type T2 *. Both T1 and T2
I'd used reg_raw_mode[regno] for general registers, even though
the array is only valid for hard registers. This patch uses
regno_reg_rtx instead.
Tested on i686-linux-gnu, committed as obvious.
Richard
gcc/
PR rtl-optimization/98347
* rtl-ssa/access-utils.h (full_register): Us
On AArch64, the vectoriser tries various ways of vectorising with both
SVE and Advanced SIMD and picks the best one. All other things being
equal, it prefers earlier attempts over later attempts.
The way this works currently is that, once it has a successful
vectorisation attempt A, it analyses a
When compiling with -msve-vector-bits=128, aarch64_preferred_simd_mode
would pass the same vector width to aarch64_simd_container_mode for
both SVE and Advanced SIMD, and so Advanced SIMD would always “win”.
This patch instead makes it choose directly between SVE and Advanced
SIMD modes, so that aa
In the PR, fwprop was changing a call instruction and tripped
an assert when trying to update a list of call clobbers.
There are two ways we could handle this: remove the call clobber
and then add it back, or assume that the clobber will stay in its
current place.
At the moment we don't have enoug
"Qian, Jianhua" writes:
> Hi Richard
>
> Thanks for reviewing again.
> I have updated the patch to v3.
Thanks, pushed to master now that the copyright assignment is on file.
Richard
Alexandre Oliva writes:
> The implicit -mlong-calls used in our arm-vxworks configurations
> changes the register allocation patterns in the arm/fp16-aapcs-2.c
> test: r3 ends up used in the long-call sequence, and we end up using
> ip as a temporary, which doesn't match the expected mov patterns.
Alexandre Oliva writes:
> The headmerge tests pass a constant to conditional calls, so that the
> same constant is always passed to a function, though it's a different
> function depending on which path is taken.
>
> The test checks that the constant appears only once in the assembly
> output, as
Alexandre Oliva writes:
> The implicit -mlong-calls from our vxworks configurations makes the
> tail-call instructions differ from those expected by the
> no_unique_address tests in gcc.target/arm.
>
> This patch adds -mno-long-calls to the compilation commands, so that
> we generate the expected
Alexandre Oliva writes:
> The implicit -mlong-calls used in our vxworks configurations changes
> the call sequences from those expected in the mve_libcall testcases.
>
> This patch brings the test output in line with the expectations, with
> an explicit -mno-long-calls.
>
> Regstrapped on x86_64-l
Alexandre Oliva writes:
> Explicitly disable some vxworks-missing features in the testsuite, that
> the current feature tests detect as present.
>
> Regstrapped on x86_64-linux-gnu, and tested with -x-arm-wrs-vxworks7r2.
> Ok to install?
>
>
> from Olivier Hainque
> for gcc/testsuite/ChangeLog
>
This PR is about a case in which the vectoriser was feeding
incorrect alignment information to tree-data-ref.c, leading
to incorrect runtime alias checks. The alignment was taken
from the TREE_TYPE of the DR_REF, which in this case was a
COMPONENT_REF with a normally-aligned type. However, the
un
aarch64's *add3_poly_1 has a pattern with the constraints:
"=...,r,&r"
"...,0,rk"
"...,Uai,Uat"
i.e. the penultimate alternative requires operands 0 and 1 to match,
but the final alternative does not allow them to match.
The register allocators dealt with this correctly, and so used
differ
In this testcase we end up with:
unsigned long long x = ...;
char y = (char) (x << 37);
The overwidening pattern realised that only the low 8 bits
of x << 37 are needed, but then tried to turn that into:
unsigned long long x = ...;
char y = (char) x << 37;
which gives an out-of-range sh
The static GET_MODE_MASKs for SVE vectors are based on the
static precisions, which in turn are based on 128-bit SVE.
The precisions are later updated based on -msve-vector-bits
(usually to become variable length), but the GET_MODE_MASK
stayed the same. This caused combine to fold:
(*_extract:D
"H.J. Lu" writes:
> On Thu, Dec 31, 2020 at 7:57 AM Richard Sandiford via Gcc-patches
> wrote:
>>
>> aarch64's *add3_poly_1 has a pattern with the constraints:
>>
>> "=...,r,&r"
>> "...,0,rk"
>> "...,Uai,
Segher Boessenkool writes:
> It isn't likely that any other pass would try to create this pattern,
> but this isn't guaranteed, and such other passes do not necessarily do
> the add-the-clobber (that is specific to combine, even!) Maybe fwprop
> could create this insn, or something like Richard's
This patch fixes a mode/rtx mismatch for ILP32 targets in:
mem = force_const_mem (ptr_mode, imm);
where imm can be Pmode rather than ptr_mode.
The patch uses convert_memory_address to convert the Pmode address
to ptr_mode before the call. However, immediate addresses can in
general co
The IFN_MASK* functions take two leading arguments: a load or
store pointer and a “cookie”. The type of the cookie is the
type of the access for TBAA purposes (like for MEM_REFs)
while the value of the cookie is the alignment of the access.
This PR was caused by a disagreement about whether the al
The expansions of the svprf[bhwd] instructions weren't taking
advantage of the immediate addressing mode.
Tested on aarch64-linux-gnu and aarch64_be-elf. Pushed to trunk so far.
Will backport to GCC 10 “soon”.
Richard
gcc/
* config/aarch64/aarch64.c (offset_6bit_signed_scaled_p): New f
This patch fixes a codegen regression in the handling of things like:
__temp.val[0] \
= vcombine_##funcsuffix (__b.val[0], \
vcreate_##funcsuffix (__AARCH64_UINT64_C
Tamar Christina writes:
> Hi All,
>
> I have been looking into a class of problems where GCC is not recognizing that
> a subreg of lane 0 (using little-endian as example) of a vector register and
> passing that to an instruction.
>
> As an example consider
>
> poly64_t
> testcase (uint8x16_t input
Andreas Schwab writes:
> That doesn't build with gcc 4.8:
Which subversion are you using? It works for me with stock gcc 4.8.5,
which is what I'd used to test the series for C++ compatiblity.
Richard
>
> In file included from ../../gcc/splay-tree-utils.h:491:0,
> from ../../gc
Andreas Schwab writes:
> On Jan 04 2021, Richard Sandiford wrote:
>
>> Andreas Schwab writes:
>>> That doesn't build with gcc 4.8:
>>
>> Which subversion are you using?
>
> This is 4.8.1.
Hmm, OK. I guess that raises the question whether “supporting GCC 4.8”
means supporting every patchlevel, o
PR98560 is about a case in which the vectoriser initially generates:
mask_1 = a < 0;
mask_2 = mask_1 & ...;
res = VEC_COND_EXPR ;
The vectoriser thus expects res to be calculated using vcond_mask.
However, we later manage to fold mask_2 to mask_1, leaving:
mask_1 = a < 0;
res = VEC_CON
This patch follows on from the previous one for the PR and
makes sure that we can handle == as well as <. Previously
we assumed without checking that IFN_VCONDEQ was available
if IFN_VCOND or IFN_VCONDU wasn't.
The patch also fixes the definition of the IFN_VCOND* functions.
The optabs are conver
"Maciej W. Rozycki" writes:
> On Wed, 16 Dec 2020, Maciej W. Rozycki wrote:
>
>> > CONST_DOUBLE_ATOF ("0", VOIDmode) seems malformed though, and I'd expect
>> > it to assert in REAL_MODE_FORMAT (via the format_helper constructor).
>> > I'm not sure the patch is strictly safer than the status quo.
Vladimir Makarov via Gcc-patches writes:
> The following fixes
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97978
>
> The patch was successfully bootstrapped on x86-64.
Can you explain this a bit more? The assert fires if the register
allocation is inconsistent with the conflict information.
Richard Biener writes:
> On Wed, 6 Jan 2021, Richard Sandiford wrote:
>
>> PR98560 is about a case in which the vectoriser initially generates:
>>
>> mask_1 = a < 0;
>> mask_2 = mask_1 & ...;
>> res = VEC_COND_EXPR ;
>>
>> The vectoriser thus expects res to be calculated using vcond_mask.
This patch extends the conditional unary integer operations
from SVE_FULL_I to SVE_I. In each case the type suffix is
taken from the element size rather than the container size:
this matters for ABS and NEG, but doesn't matter for NOT.
Tested on aarch64-linux-gnu and aarch64_be-elf, pushed to tru
Tamar Christina writes:
> Hi All,
>
> This fixes a logical inconsistency with the SVE2 ACLE tests where the SVE2
> tests
> are checking for SVE support in the assembler instead of SVE2.
>
> This makes all these tests fail when the user has an SVE enabled assembler but
> not an SVE2 one.
>
> Ok fo
This patch extends the conditional UXT patterns from SVE_FULL_I
to SVE_I. It doesn't matter in this case whether the type suffix
is taken from the element size or the container size.
Tested on aarch64-linux-gnu and aarch64_be-elf, pushed to trunk.
Richard
gcc/
* config/aarch64/aarch64-
This patch adds unpacked support for unconditional and
conditional CNOT. The type suffix has to be taken from
the element size rather than the container size.
Tested on aarch64-linux-gnu and aarch64_be-elf. Pushed to trunk.
Richard
gcc/
* config/aarch64/aarch64-sve.md (*cnot): Extend
Qian Jianhua writes:
> This patch add cost tables for A64FX.
>
> ChangeLog:
> 2021-01-08 Qian jianhua
>
> gcc/
> * config/aarch64/aarch64-cost-tables.h (a64fx_extra_costs): New.
> * config/aarch64/aarch64.c (a64fx_addrcost_table): New.
> (a64fx_regmove_cost, a64fx_vector_cost):
"Qian, Jianhua" writes:
> Hi Richard
>
>> -Original Message-
>> From: Richard Sandiford
>> Sent: Friday, January 8, 2021 7:04 PM
>> To: Qian, Jianhua/钱 建华
>> Cc: gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH v2] aarch64: Add cpu cost tables for A64FX
>>
>> Qian Jianhua writes:
>> > Th
This patch adds support for unpacked SVE LSL, ASR and LSR.
For right shifts, the type suffix needs to be taken from the
element size rather than the container size.
Tested on aarch64-linux-gnu and aarch64_be-elf. Pushed to trunk.
Richard
gcc/
* config/aarch64/aarch64-sve.md (3)
This patch makes the SVE_INT_BINARY_IMM patterns support
unpacked arithmetic, covering MUL, SMAX, SMIN, UMAX and UMIN.
For min and max, the type suffix must be taken from the element
size rather than the container size.
The XFAILs are due to PR98602.
Tested on aarch64-linux-gnu and aarch64_be-elf
This patch adds support for conditional binary ADD, SUB, MUL, SMAX,
UMAX, SMIN, UMIN, LSL, LSR, ASR, AND, ORR and EOR. It's not really
possible to split it up further given how the patterns are written.
Min, max and right-shift need the element size rather than the container
size. The others wou
This patch extends the ADR patterns to handle unpacked vectors.
They would work with both elements and containers, but since
the instructions only support .s and .d, we get more coverage
by using containers.
Tested on aarch64-linux-gnu and aarch64_be-elf. Pushed to trunk.
Richard
gcc/
This patch adds support for unpacked SVE SABD and UABD.
It also rewrites the patterns so that they match as combine
patterns without the need for REG_EQUAL notes. Finally,
there was no pattern for merging with the second input,
which can be handled by reversing the operands.
The type suffix needs
This patch extends the SMULH and UMULH support to unpacked vectors.
The type suffix must be taken from the element size rather than the
container size.
The main use of these patterns is to support division and modulus
by a constant. The conditional forms would be hard to trigger from
non-ACLE cod
This patch adds support for unpacked conditional BIC. The type suffix
could be taken from the element size or the container size, so the
patch continues to use the element size. This is consistent with
the existing support for unconditional BIC.
Tested on aarch64-linux-gnu and aarch64_be-elf. P
This patch adds support for both conditional and unconditional unpacked
ASRD. This meant adding a new define_insn for the unconditional form,
instead of reusing the conditional instructions. It also meant
extending the current conditional patterns to support merging with
any independent value, no
This is a repost of:
https://gcc.gnu.org/pipermail/gcc-patches/2020-February/539763.html
which was initially posted during stage 4. (And yeah, I only just
missed stage 4 again.)
IMO it would be better to fix the bug directly (as the patch tries
to do) instead of wait for a more thorough redes
Iain Sandoe writes:
> Hi,
>
> The armv8_arm manual [C6.2.226, ROR (immediate)] uses a # in front
> of the immediate rotation quantity.
>
> Although, it seems, GAS is able to infer the # (or is leninent about
> its absence) assemblers based on the LLVM back end expect it and error out.
>
> tested o
Iain Sandoe writes:
> Hi Richard,
>
> Richard Sandiford wrote:
>
>> Iain Sandoe writes:
>
>>> The armv8_arm manual [C6.2.226, ROR (immediate)] uses a # in front
>>> of the immediate rotation quantity.
>>>
>>> Although, it seems, GAS is able to infer the # (or is leninent about
>>> its absence) a
This patch fixes a regression on sh4 introduced by the rtl-ssa stuff.
The port had a pattern:
(define_insn "movsf_ie"
[(set (match_operand:SF 0 "general_movdst_operand"
"=f,r,f,f,fy, f,m, r, r,m,f,y,y,rf,r,y,<,y,y")
(match_operand:SF 1 "general_movsrc_oper
Noticed while looking at something else that the comment above
def_lookup got the description of the comparisons the wrong way
round.
Tested on aarch64-linux-gnu and pushed as obvious.
Richard
gcc/
* rtl-ssa/accesses.h (def_lookup): Fix order of comparison results.
---
gcc/rtl-ssa/acce
Noticed while testing on a different machine that the sve/sel_*.c
tests require .variant_pcs support but don't test for it.
.variant_pcs post-dates SVE so there shouldn't be a need to test
for both.
Tested on aarch64-linux-gnu & pushed.
Richard
gcc/testsuite/
* gcc.target/aarch64/sve/se
This patch extends the MLA/MAD patterns to support unpacked
integer vectors. The type suffix could be either the element
size or the container size, but using the element size should
be more efficient.
Tested on aarch64-linux-gnu and aarch64_be-elf, pushed to trunk.
Richard
gcc/
* conf
This patch extends the MLS/MSB patterns to support unpacked
integer vectors. The type suffix could be either the element
size or the container size, but using the element size should
be more efficient.
Tested on aarch64-linux-gnu and aarch64_be-elf, pushed to trunk.
Richard
gcc/
* conf
At the moment, if we use only one vector of an LD4 result,
we'll treat the LD4 as having the cost of a single load.
But all 4 loads and any associated permutes take place
regardless of which results are actually used.
This patch therefore counts the cost of unused LOAD_LANES
results against the fi
s/ref/reg/ on a previously unused function name.
Sorry for the blunder. Tested on aarch64-linux-gnu, aarch64_be-elf
and x86_64-linux-gnu, pushed as obvious.
Richard
gcc/
* rtl-ssa/functions.h (function_info::ref_defs): Rename to...
(function_info::reg_defs): ...this.
*
Noticed while working on something else that the insn_change_watermark
destructor could call cancel_changes for changes that no longer exist.
The loop in cancel_changes is a nop in that case, but:
num_changes = num;
can mess things up.
I think this would only affect nested uses of insn_change_
This patch adds a small target-specific pass to remove redundant SVE
PTEST instructions. There are two important uses of this:
- Removing PTESTs after WHILELOs (PR88836). The original testcase
no longer exhibits the problem due to more recent optimisations,
but it can still be seen in simple
Tamar Christina writes:
> Hi All,
>
> This adds implementation for the optabs for complex operations. With this the
> following C code:
>
> void g (float complex a[restrict N], float complex b[restrict N],
> float complex c[restrict N])
> {
> for (int i=0; i < N; i++)
> c[i]
Hongtao Liu via Gcc-patches writes:
> Hi:
> If SRC had been assigned a mode narrower than the copy, we can't link
> DEST into the chain even they have same
> hard_regno_nregs(i.e. HImode/SImode in i386 backend).
In general, changes between modes within the same hard register are OK.
Could you e
Hongtao Liu writes:
> On Mon, Jan 18, 2021 at 6:18 PM Richard Sandiford
> wrote:
>>
>> Hongtao Liu via Gcc-patches writes:
>> > Hi:
>> > If SRC had been assigned a mode narrower than the copy, we can't link
>> > DEST into the chain even they have same
>> > hard_regno_nregs(i.e. HImode/SImode i
Richard Biener writes:
> On Mon, 18 Jan 2021, Jan Hubicka wrote:
>
>> > This is a repost of:
>> >
>> > https://gcc.gnu.org/pipermail/gcc-patches/2020-February/539763.html
>> >
>> > which was initially posted during stage 4. (And yeah, I only just
>> > missed stage 4 again.)
>> >
>> > IMO it
Qing Zhao writes:
D will keep all initialized aggregates as aggregates and live which
means stack will be allocated for it. With A the usual optimizations
to reduce stack usage can be applied.
>>>
>>> I checked the routine “poverties::bump_map” in 511.povray_r since it
>>> has a l
Jan Hubicka writes:
>> >>
>> >> Well, in tree-ssa code we do assume these to be either disjoint objects
>> >> or equal (in decl_refs_may_alias_p that continues in case
>> >> compare_base_decls is -1). I am not sure if we win much by threating
>> >> them differently on RTL level. I would preffer
Kyrylo Tkachov via Gcc-patches writes:
> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def
> b/gcc/config/aarch64/aarch64-simd-builtins.def
> index
> 6efc7706a41e02d947753a4cda984159b68bd39f..27e9026d9e8b7ff980c5b8d9ff1b00490e3a18cb
> 100644
> --- a/gcc/config/aarch64/aarch64-simd-built
Hongtao Liu writes:
> On Mon, Jan 18, 2021 at 7:10 PM Richard Sandiford
> wrote:
>>
>> Hongtao Liu writes:
>> > On Mon, Jan 18, 2021 at 6:18 PM Richard Sandiford
>> > wrote:
>> >>
>> >> Hongtao Liu via Gcc-patches writes:
>> >> > Hi:
>> >> > If SRC had been assigned a mode narrower than the
Hans-Peter Nilsson writes:
> On Tue, 19 Jan 2021, Jakub Jelinek wrote:
>
>> On Mon, Jan 18, 2021 at 11:50:56PM -0500, Hans-Peter Nilsson wrote:
>> > On Mon, 18 Jan 2021, John David Anglin wrote:
>> > > The hppa target is a reload target and asm goto is not supported on
>> > > reload targets.
>> >
Jakub Jelinek via Gcc-patches writes:
> On Tue, Jan 19, 2021 at 12:38:47PM +0000, Richard Sandiford via Gcc-patches
> wrote:
>> > actually only the lower 16bits are needed, the original insn is like
>> >
>> > .294.r.ira
>> > (insn 69 68 70 13 (set (
Jonathan Wright writes:
> Hi,
>
> As subject, this patch rewrites integer mls Neon intrinsics to use
> a - b * c rather than inline assembly code, allowing for better
> scheduling and optimization.
>
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
>
> If ok, please co
Jan Hubicka writes:
>> On Mon, 18 Jan 2021, Richard Sandiford wrote:
>>
>> > Jan Hubicka writes:
>> > >> >>
>> > >> >> Well, in tree-ssa code we do assume these to be either disjoint
>> > >> >> objects
>> > >> >> or equal (in decl_refs_may_alias_p that continues in case
>> > >> >> compare_base
duplicate_and_interleave is the main fallback way of loading
a repeating sequence of elements into variable-length vectors.
The code handles cases in which the number of elements in the
sequence is potentially several times greater than the number
of elements in a vector.
Let:
- NE be the (compil
Hongtao Liu writes:
> On Wed, Jan 20, 2021 at 12:10 AM Richard Sandiford
> wrote:
>>
>> Jakub Jelinek via Gcc-patches writes:
>> > On Tue, Jan 19, 2021 at 12:38:47PM +, Richard Sandiford via
>> > Gcc-patches wrote:
>> >> > actually onl
Richard Biener writes:
> diff --git a/gcc/hwint.h b/gcc/hwint.h
> index 127b0130c66..8812bc7150f 100644
> --- a/gcc/hwint.h
> +++ b/gcc/hwint.h
> @@ -333,4 +333,46 @@ absu_hwi (HOST_WIDE_INT x)
>return x >= 0 ? (unsigned HOST_WIDE_INT)x : -(unsigned HOST_WIDE_INT)x;
> }
>
> +/* Compute the
Ilya Leoshkevich via Gcc-patches writes:
> On Tue, 2021-01-19 at 09:41 +0100, Richard Biener wrote:
>> On Mon, Jan 18, 2021 at 11:04 PM Ilya Leoshkevich via Gcc-patches
>> wrote:
>> >
>> Suppose we have:
>> >
>> > (set (reg/v:TF 63) (mem/c:TF (reg/v:DI 62)))
>> > (set (reg:FPRX2 66) (su
Wilco Dijkstra via Gcc-patches writes:
> In aarch64_classify_symbol symbols are allowed large offsets on relocations.
> This means the offset can use all of the +/-4GB offset, leaving no offset
> available for the symbol itself. This results in relocation overflow and
> link-time errors for simpl
Ilya Leoshkevich writes:
> On Thu, 2021-01-21 at 10:49 +, Richard Sandiford wrote:
>> What prevents combine from handling this? Are the instructions in
>> different blocks?
>
> I wanted to do this before combine, because in __ieee754_sqrtl case
> fwprop turns this (example from the commit mes
Ilya Leoshkevich via Gcc-patches writes:
> v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563800.html
>
> v1 -> v2: Allow (mem) -> (subreg) propagation only for single uses.
>
> Boostrapped and regtested on x86_64-redhat-linux, ppc64le-redhat-linux
> and s390x-redhat-linux. Ok for mas
For variable-length vectors, the N inside “vector(N) T” can
contain the characters ‘[’, ‘]’ and ‘,’.
Tested on aarch64-linux-gnu (with and without SVE), arm-linux-gnueabihf
and x86_64-linux-gnu. Pushed as obvious.
Richard
gcc/testsuite/
* gcc.dg/vect/pr91750.c: Allow "[]," inside a vec
The XFAIL for variable-length vectors is no longer needed since
we can't build the required constant vector and so fall back to
fixed-length alternatives.
Tested on aarch64-linux-gnu (with and without SVE), arm-linux-gnueabihf
and x86_64-linux-gnu. Pushed as obvious.
Richard
gcc/testsuite/
For variable-length SVE, we can only use SLP for N scalars of type
T if the number of Ts in a vector is a multiple of N. For ints
this means that N must be 4 or 2, so this patch XFAILs two tests
for N==8.
The exact limit seems inherently target-specific -- variable-length
vectors with a 256-bit g
We don't yet support SLP inductions for variable-length vectors,
so this patch XFAILs some associated tests.
(Inductions aren't inherently difficult to support. It just hasn't
been done yet.)
Tested on aarch64-linux-gnu (with and without SVE), arm-linux-gnueabihf
and x86_64-linux-gnu. Pushed as
We can vectorise vect/pr65947-8.c for SVE, as we can for GCN.
Tested on aarch64-linux-gnu (with and without SVE), arm-linux-gnueabihf
and x86_64-linux-gnu. Pushed as obvious.
Richard
gcc/testsuite/
* gcc.dg/vect/pr65947-8.c: Expect the loop to be vectorized for SVE.
---
gcc/testsuite/
Because we disable the cost model, targets with variable-length
vectors can end up vectorising the store to a[0..7] on its own.
With the cost model we do something sensible.
Tested on aarch64-linux-gnu (with and without SVE), arm-linux-gnueabihf
and x86_64-linux-gnu. Pushed as obvious.
Richard
We don't try to increase the alignment of decls if
vect_element_align_preferred.
Tested on aarch64-linux-gnu (with and without SVE), arm-linux-gnueabihf
and x86_64-linux-gnu. Pushed as obvious.
Richard
gcc/testsuite/
* gcc.dg/vect/aligned-section-anchors-nest-1.c: XFAIL alignment
We still fall back to load/store-lanes for slp-46.c, if the target
supports it.
Tested on aarch64-linux-gnu (with and without SVE), arm-linux-gnueabihf
and x86_64-linux-gnu. Pushed as obvious.
Richard
gcc/testsuite/
* gcc.dg/vect/slp-46.c: XFAIL test for SLP on vect_load_lanes targets.
We're now able to vectorise the set-up loop:
int p = power2 (fns[i].po2);
for (int j = 0; j < N; j++)
a[j] = ((p << 4) * j) / (N - 1) - (p << 5);
Rather than adjust the expected output for that, it seemed better
to disable optimisation for the testing code.
Tested on aarch64-
We don't need an epilogue loop if the main loop can operate on
partial vectors, so this patch disables an associated test.
The alternative would be to force partial-vectors-usage=1
on the command line.
Tested on aarch64-linux-gnu (with and without SVE), arm-linux-gnueabihf
and x86_64-linux-gnu. O
SLP vectorisation of gcc.dg/vect/fast-math-vect-call-1.c involves
a group of 3 floats, which requires the same permutation as
vect_perm3_int.
The load/store_lanes XFAILs in gcc.dg/vect/slp-perm-6.c implicitly
assumed vect_perm3_int, which is true for Advanced SIMD but not for
VLA SVE. Whether it'
On arm* and aarch64* targets, we can vectorise the second of the main
loops using SLP, not just the third. As the comments say, whether this
is supported depends on a very specific permutation, so it seemed better
to use direct target selectors.
Tested on aarch64-linux-gnu (with and without SVE),
AArch64 passes the "not profitable" test because it treats vec_construct
as having a high-enough cost. This means that we can try other vector
modes, which in turn causes "BB vectorization with gaps at the end of
a load is not supported" to be printed more than once. The number of
times that we p
The vectorizable_call part of r11-1143 dropped the required
vectype when moving from vect_get_vec_def_for_operand to
vect_get_vec_defs_for_operand. This caused an ICE on the
testcase for SVE, because we ended up with a non-predicate
value being passed to a predicate input.
AFAICT this was the onl
These tests started passing a while ago, so remove the XFAILs.
Tested on aarch64-linux-gnu, pushed to trunk.
Richard
gcc/testsuite/
* gcc.target/aarch64/sve/cond_cnot_1.c: Remove XFAIL.
* gcc.target/aarch64/sve/cond_unary_1.c: Likewise.
---
gcc/testsuite/gcc.target/aarch64/sve/
Tamar Christina writes:
> Hi All,
>
> This a partial backport for 0f801e0b6cc9f67c9a8983127e23161f6025c5b6 which
> fixes
> a truncation error for the inline memcopy on AArch64 on GCC-8.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for GCC-8?
OK, thanks.
Richard
>
acsaw...@linux.ibm.com writes:
> From: Aaron Sawdey
>
> Richard,
> Thanks for the review. I think I have resolved everything, as follows:
>
> * I was able to remove the const_tiny_rtx initialization for
> MODE_OPAQUE. If that becomes a problem it's a pretty simple matter to
> use an UNSPEC to a
1501 - 1600 of 2183 matches
Mail list logo