From: Juzhe-Zhong
This patch is adding mask logic auto-vectorization.
define the pattern as "define_insn_and_split" to allow
combine PASS easily combine series instructions.
For example:
combine vmxor.mm + vmnot.m into vmxnor.mm
Build success and regression PASS
Ok for trunk ?
gcc/ChangeLog:
Just one comment: define_insn_and_split should be used in this
scenario rather than define_insn_and_rewrite since you are not really
rewriting.
You can commit after updating to define_insn_and_split :)
On Wed, May 24, 2023 at 3:04 PM wrote:
>
> From: Juzhe-Zhong
>
> This patch is adding mask lo
Thanks kito.,
change it into define_insn_and_split send V2 soon.
juzhe.zh...@rivai.ai
From: Kito Cheng
Date: 2023-05-24 15:18
To: juzhe.zhong
CC: gcc-patches; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Add RVV mask logic auto-vectorization
Just one comment:
From: Juzhe-Zhong
This patch is adding mask logic auto-vectorization.
define the pattern as "define_insn_and_split" to allow
combine PASS easily combine series instructions.
For example:
combine vmxor.mm + vmnot.m into vmxnor.mm
Build success and regression PASS
And committed.
---
gcc/config
LGTM, just one comment in git comment, no need v3, just commit with
the fix is fine :)
On Wed, May 24, 2023 at 3:28 PM wrote:
>
> From: Juzhe-Zhong
>
> This patch is adding mask logic auto-vectorization.
> define the pattern as "define_insn_and_split" to allow
don't forgot to update here ^
>
> From: Juzhe-Zhong
>
> This patch is adding mask logic auto-vectorization.
> define the pattern as "define_insn_and_split" to allow
>don't forgot to update here ^
I notice I missed changeLog here. Is that you want me to fix in the
commit log?
juzhe.zh...@rivai.ai
From: Kito Cheng
D
Oh, never mind, I mean you updated to use define_insn_and_split but
comment still define_insn_and_split, but just ignore that if already
committed
On Wed, May 24, 2023 at 3:42 PM juzhe.zh...@rivai.ai
wrote:
>
> >
> > From: Juzhe-Zhong
> >
> > This patch is adding mask logic auto-vectorization.
>
Committed, thanks Kito.
Pan
-Original Message-
From: Kito Cheng
Sent: Wednesday, May 24, 2023 4:08 PM
To: juzhe.zh...@rivai.ai
Cc: gcc-patches ; Kito.cheng ;
palmer ; palmer ; jeffreyalaw
; Robin Dapp ; Li, Pan2
Subject: Re: Re: [V2 COMMITTED] RISC-V: Add RVV mask logic auto-vectori
On Wed, 24 May 2023, Richard Sandiford wrote:
> When I wrote early-remat, the DF_FORWARD block order was a postorder
> of a reverse/backward walk (i.e. of the inverted cfg), rather than a
> reverse postorder of a forward walk. A postorder of a backward walk
> lacked the important property that do
Hi all,
As the PR says we shouldn't be using qualifier_unsigned for the return type of
the __ssat intrinsics.
UNSIGNED_SAT_BINOP_UNSIGNED_IMM_QUALIFIERS already exists for that.
This was just a thinko.
This patch fixes this and the warning with -Wconversion goes away.
Bootstrapped and tested on
On Wed, 24 May 2023 at 06:52, Alexandre Oliva via Libstdc++ <
libstd...@gcc.gnu.org> wrote:
>
> Just as on aarch64, x86's wider long double experiences loss of
> precision with from_chars implemented in terms of double. Expect the
> execution fail.
>
> Bootstrapped on x86_64-linux-gnu. Also test
Wang Lei raised some concerns about Itanium C++ ABI, so let's ask a C++
expert here...
Jonathan: AFAIK the standard and the Itanium ABI treats an empty class
as size 1 in order to guarantee unique address, so for the following:
class Empty {};
class Test { Empty empty; double a, b; };
When we pa
在 2023/5/24 下午2:45, Xi Ruoyao 写道:
On Wed, 2023-05-24 at 14:04 +0800, Lulu Cheng wrote:
An empty struct type that is not non-trivial for the purposes of calls
will be treated as though it were the following C type:
struct {
char c;
};
Before this patch was added, a structure parameter cont
On Tue, May 23, 2023 at 8:30 PM Roger Sayle wrote:
>
>
> PR middle-end/109840 is a regression introduced by my recent patch to
> fold popcount(bswap(x)) as popcount(x). When the bswap and the popcount
> have the same precision, everything works fine, but this optimization also
> allowed a zero-ex
On Wed, 24 May 2023 at 09:41, Xi Ruoyao wrote:
> Wang Lei raised some concerns about Itanium C++ ABI, so let's ask a C++
> expert here...
>
> Jonathan: AFAIK the standard and the Itanium ABI treats an empty class
> as size 1
Only as a complete object, not as a subobject.
> in order to guarant
On Wed, May 24, 2023 at 1:16 AM Andrew Pinski via Gcc-patches
wrote:
>
> While trying to understand how to use the ! operand for match
> patterns, I noticed that the debug dumps would print out applying
> a pattern but nothing when it was rejected in the end. This was confusing
> me.
> This adds t
On Wed, May 24, 2023 at 7:17 AM Alexandre Oliva via Gcc-patches
wrote:
>
>
> tsvc tests all fail on systems that don't offer a malloc.h, other than
> those that explicitly rule that out. Use the preprocessor to test for
> malloc.h's availability.
>
> tsvc.h also expects a definition for struct ti
On Wed, May 24, 2023 at 7:19 AM Alexandre Oliva via Gcc-patches
wrote:
>
>
> Fix test that uses -fPIC without stating the requirement for PIC
> support.
>
> Bootstrapped on x86_64-linux-gnu. Also tested on ppc- and x86-vx7r2
> with gcc-12.
OK.
> for gcc/testsuite/ChangeLog
>
> * gcc.ta
On Wed, May 24, 2023 at 7:20 AM Alexandre Oliva via Gcc-patches
wrote:
>
>
> Fix test that uses -fopenmp without declaring requirement for pthread
> support.
>
> Bootstrapped on x86_64-linux-gnu. Also tested on ppc- and x86-vx7r2
> with gcc-12.
OK
> for gcc/testsuite/ChangeLog
>
> * g+
On Wed, May 24, 2023 at 7:21 AM Alexandre Oliva via Gcc-patches
wrote:
>
>
> Fix two tests that use -pg but don't declare their requirement for
> profiling support.
>
> Bootstrapped on x86_64-linux-gnu. Also tested on ppc- and x86-vx7r2
> with gcc-12.
OK.
> for gcc/testsuite/ChangeLog
>
>
On Wed, May 24, 2023 at 7:40 AM Alexandre Oliva via Gcc-patches
wrote:
>
> On May 5, 2022, Alexandre Oliva wrote:
>
> > for gcc/ChangeLog
>
> > PR target/100106
> > * emit-rtl.cc (validate_subreg): Reject a SUBREG of a MEM that
> > requires stricter alignment than MEM's.
>
> >
On Wed, May 24, 2023 at 7:47 AM Alexandre Oliva via Gcc-patches
wrote:
>
>
> MOVE_MAX on x86* used to accept up to 16 bytes, even without SSE,
> which enabled inlining of small memmove by loading and then storing
> the entire range. After the "x86: Update piecewise move and store"
> r12-2666 chan
On Wed, 2023-05-24 at 16:47 +0800, Lulu Cheng wrote:
>
> 在 2023/5/24 下午2:45, Xi Ruoyao 写道:
> > On Wed, 2023-05-24 at 14:04 +0800, Lulu Cheng wrote:
> > > An empty struct type that is not non-trivial for the purposes of
> > > calls
> > > will be treated as though it were the following C type:
> > >
On Mon, 22 May 2023 at 14:18, Richard Sandiford
wrote:
>
> Prathamesh Kulkarni writes:
> > Hi Richard,
> > Thanks for the suggestions. Does the attached patch look OK ?
> > Boostrap+test in progress on aarch64-linux-gnu.
>
> Like I say, please wait for the tests to complete before sending an RFA.
On Tue, May 23, 2023 at 2:56 PM Georg-Johann Lay wrote:
>
> PR target/104327 not only affects s390 but also avr:
> The avr backend pre-sets some options depending on optimization level.
> The inliner then thinks that always_inline functions are not eligible
> for inlining and terminates with an er
Patch V2: adds new patch.
Patch V3: `%{mmips16e2} \` puts the wrong palce in first patch,
V3 fix it.
The MIPS16e2 ASE is an enhancement to the MIPS16e ASE,
which includes all MIPS16e instructions, with some addition.
This series of patches adds all instructions from MIPS16E2 ASE
with correspondin
There are shortened bitwise instructions in the mips16e2 ASE,
for instance, ANDI, ORI/XORI, EXT, INS etc. .
This patch adds these instrutions with corresponding tests.
gcc/ChangeLog:
* config/mips/constraints.md(Yz): New constraints for mips16e2.
* config/mips/mips-protos.h(mips_
This patch adds LUI instruction from mips16e2
with corresponding test.
gcc/ChangeLog:
* config/mips/mips.cc(mips_symbol_insns_1): Generates LUI instruction.
(mips_const_insns): Same as above.
(mips_output_move): Same as above.
(mips_output_function_prologue): Same
This patch adds MOVx instructions from mips16e2
(movn,movz,movtn,movtz) with corresponding tests.
gcc/ChangeLog:
* config/mips/mips.h(ISA_HAS_CONDMOVE): Add condition for
ISA_HAS_MIPS16E2.
* config/mips/mips.md(*mov_on_): Add logics for
MOVx insts.
(*mov_on__mips16e2): G
The MIPS16e2 ASE is an enhancement to the MIPS16e ASE,
which includes all MIPS16e instructions, with some addition.
It defines new special instructions for increasing
code density (e.g. Extend, PC-relative instructions, etc.).
This patch adds basic support for mips16e2 used by the
following series
This patch adds LWL/LWR, SWL/SWR instructions with their
corresponding tests.
gcc/ChangeLog:
* config/mips/mips.cc(mips_expand_ins_as_unaligned_store):
Add logics for generating instruction.
* config/mips/mips.h(ISA_HAS_LWL_LWR): Add clause for ISA_HAS_MIPS16E2.
*
The mips16e2 ASE uses eight general-purpose registers
from mips32, with some special-purpose registers,
these registers are GPRs: s0-1, v0-1, a0-3, and
special registers: t8, gp, sp, ra.
As mentioned above, the special register gp is
used in mips16e2, which is the global pointer register,
it is us
The MIPS16e2 ASE has PREF, LL and SC instructions,
they use 9 bits immediate, like mips32r6.
The MIPS32 PRE-R6 uses 16 bits immediate.
gcc/ChangeLog:
* config/mips/mips.h(ISA_HAS_9BIT_DISPLACEMENT): Add clause
for ISA_HAS_MIPS16E2.
(ISA_HAS_SYNC): Same as above.
(I
This patch adds CACHE instruction from mips16e2
with corresponding tests.
gcc/ChangeLog:
* config/mips/mips.c(mips_9bit_offset_address_p): Restrict the
address register to M16_REGS for MIPS16.
(BUILTIN_AVAIL_MIPS16E2): Defined a new macro.
(AVAIL_MIPS16E2_OR_NON_MI
This patch allows mips16e2 acts the same with -O1~3
when generating ZEB/ZEH instead of ANDI under
the -O0 option, which shrinks the code size.
gcc/ChangeLog:
* config/mips/mips.md(*and3_mips16): Generates
ZEB/ZEH instructions.
---
gcc/config/mips/mips.md | 30 +
On Tue, May 23, 2023 at 5:05 PM wrote:
>
> From: Juzhe-Zhong
>
> This patch enable RVV auto-vectorization including floating-point
> unorder and order comparison.
>
> The testcases are leveraged from Richard.
> So include Richard as co-author.
>
> Co-Authored-By: Richard Sandiford
>
> gcc/Change
Thanks for the update. Mostly LGTM, just some minor things left below.
Oluwatamilore Adebayo writes:
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index
> a49b09539776c0056e77f99b10365d0a8747fbc5..3a2248263cf67834a1cb41167a1783a3b6400014
> 100644
> --- a/gcc/tree-vect-
Hi,
on the attached testcase, the Ada compiler gives a bogus warning:
storage_offset1.ads:16:52: warning: Constraint_Error will be raised at run
time [enabled by default]
This directly comes from the GENERIC folding setting a bogus TREE_OVERFLOW on
an INTEGER_CST during the (T)P - (T)(P + A) ->
Richard Biener writes:
> On Tue, May 23, 2023 at 5:05 PM wrote:
>>
>> From: Juzhe-Zhong
>>
>> This patch enable RVV auto-vectorization including floating-point
>> unorder and order comparison.
>>
>> The testcases are leveraged from Richard.
>> So include Richard as co-author.
>>
>> Co-Authored-B
在 2023/5/24 下午5:25, Xi Ruoyao 写道:
On Wed, 2023-05-24 at 16:47 +0800, Lulu Cheng wrote:
在 2023/5/24 下午2:45, Xi Ruoyao 写道:
On Wed, 2023-05-24 at 14:04 +0800, Lulu Cheng wrote:
An empty struct type that is not non-trivial for the purposes of
calls
will be treated as though it were the following
Hi Richard,
On Tue, 23 May 2023 at 11:55, Richard Biener via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:
> The following fixes code hoisting to properly consider ANTIC_OUT instead
> of ANTIC_IN. That's a bit expensive to re-compute but since we no
> longer iterate we're doing this only once pe
Prathamesh Kulkarni writes:
> On Mon, 22 May 2023 at 14:18, Richard Sandiford
> wrote:
>>
>> Prathamesh Kulkarni writes:
>> > Hi Richard,
>> > Thanks for the suggestions. Does the attached patch look OK ?
>> > Boostrap+test in progress on aarch64-linux-gnu.
>>
>> Like I say, please wait for the
The following dispatches to V2DImode CTOR expansion instead of
using sets of (subreg:DI (reg:V16QI 146) [08]) which causes
LRA to spill DImode and reload V16QImode. The same applies for
V8QImode or V4HImode construction from SImode parts which happens
during 32bit libgcc build.
Boostrapped and te
On Wed, 24 May 2023, Christophe Lyon wrote:
> Hi Richard,
>
> On Tue, 23 May 2023 at 11:55, Richard Biener via Gcc-patches <
> gcc-patches@gcc.gnu.org> wrote:
>
> > The following fixes code hoisting to properly consider ANTIC_OUT instead
> > of ANTIC_IN. That's a bit expensive to re-compute but
OK for master and all branches? (this issue only surfaced because of the new
test)
8< -
On ARM NEON doesn't support double, so __is_intrinsic_type_v should say false (instead of being ill-formed).
Signed-off-by: Matthias Kretz
libstdc++-v3/ChangeLog:
PR l
On Wed, 24 May 2023 at 11:59, Matthias Kretz via Libstdc++ <
libstd...@gcc.gnu.org> wrote:
> OK for master and all branches? (this issue only surfaced because of the
> new
> test)
>
OK.
>
> 8< -
>
> On ARM NEON doesn't support double, so __is_intrinsic_type_v whate
On Tue, 23 May 2023 at 22:57, Matthias Kretz via Libstdc++ <
libstd...@gcc.gnu.org> wrote:
>
> Signed-off-by: Matthias Kretz
>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/109261
> * include/experimental/bits/simd_neon.h (_S_reduce): Add
> constexpr and make NEON implementat
On Wed, May 24, 2023 at 11:56 AM Eric Botcazou via Gcc-patches
wrote:
>
> Hi,
>
> on the attached testcase, the Ada compiler gives a bogus warning:
> storage_offset1.ads:16:52: warning: Constraint_Error will be raised at run
> time [enabled by default]
>
> This directly comes from the GENERIC fold
The PR109849 fix made us no longer hoist some memory loads because
of the expression set intersection. We can still avoid to compute
the union by simply taking the first sets expressions and leave
the pruning of expressions with values not suitable for hoisting
to sorted_array_from_bitmap_set.
Bo
From: Juzhe-Zhong
An obvious fix to make all enum naming consistent.
gcc/ChangeLog:
* config/riscv/riscv-protos.h (enum frm_field_enum): Add FRM_ prefix.
---
gcc/config/riscv/riscv-protos.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/config/riscv/riscv-prot
ok
On Wed, May 24, 2023 at 7:20 PM wrote:
>
> From: Juzhe-Zhong
>
> An obvious fix to make all enum naming consistent.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-protos.h (enum frm_field_enum): Add FRM_ prefix.
>
> ---
> gcc/config/riscv/riscv-protos.h | 2 +-
> 1 file changed, 1 inser
On Wed, May 24, 2023 at 11:57 AM Richard Sandiford
wrote:
>
> Richard Biener writes:
> > On Tue, May 23, 2023 at 5:05 PM wrote:
> >>
> >> From: Juzhe-Zhong
> >>
> >> This patch enable RVV auto-vectorization including floating-point
> >> unorder and order comparison.
> >>
> >> The testcases are
Sorry for the slow review. I needed some time to go through this
patch and surrounding code to understand it, and to understand
why it wasn't structured the way I was expecting.
I've got some specific comments below, and then a general comment
about how I think we should structure this.
juzhe.zh
From: Juzhe-Zhong
According to RVV ISA:
The conversions use the dynamic rounding mode in frm, except for the rtz
variants, which round towards zero.
So rtz conversion patterns should not have FRM dependency.
We can't support mode switching for FRM yet since rvv intrinsic doc is not
updated bu
Committed, thanks Kito.
Pan
-Original Message-
From: Gcc-patches On Behalf
Of Kito Cheng via Gcc-patches
Sent: Wednesday, May 24, 2023 7:21 PM
To: juzhe.zh...@rivai.ai
Cc: gcc-patches@gcc.gnu.org; kito.ch...@sifive.com; pal...@rivosinc.com;
rdapp@gmail.com; jeffreya...@gmail.com
Su
Hi, Richard. It's quite complicated for me and I am not sure whether I can
catch up with you.
So I will rather split the work step by step to implement the decrement IV
For the first step you mentioned:
>> (1) In vect_set_loop_condition_partial_vectors, for the first iteration of:
>> FOR_EA
Hi, Richard.
For step 1. I have write this patch. Could you take a look at it?
Thanks.
juzhe.zh...@rivai.ai
From: Richard Sandiford
Date: 2023-05-24 19:23
To: juzhe.zhong
CC: gcc-patches; rguenther
Subject: Re: [PATCH V12] VECT: Add decrement IV iteration loop control by
variable amount supp
Call expandargv prior attempting to prepend a dash to the first
argument. When using response files the first character is never a dash
but an at-sign.
PR gcc/77576
gcc/ChangeLog:
* gcc-ar.cc (main): Call expandargv.
---
gcc/gcc-ar.cc | 2 ++
1 file changed, 2 insertions(+)
d
On Wed, May 24, 2023 at 12:13 PM Richard Biener wrote:
>
> The following dispatches to V2DImode CTOR expansion instead of
> using sets of (subreg:DI (reg:V16QI 146) [08]) which causes
> LRA to spill DImode and reload V16QImode. The same applies for
> V8QImode or V4HImode construction from SImode
> I don't like littering the patterns with this and it's likely far from the
> only cases we have?
Maybe, but that's the only problematic case we have in Ada. It occurs only on
mainline because we have streamlined address calculations there, from out-of-
line to inline expansion, i.e. from run t
Sorry, I realised later that I had an implicit assumption here:
if there are multiple rgroups, it's better to have a single IV
for the smallest rgroup and scale that up to bigger rgroups.
E.g. if the loop control IV is taken from an N-control rgroup
and has a step S, an N*M-control rgroup would be
Instead of defaulting to an initial value of VARYING before resolving
cycles, try folding the statement using available global values
instead. THis can give us a much better initial approximation,
especially in cases where there are no dependencies, ie
f_45 = 77
This implements suggestion
This patch implements suggestion 1) from the PR:
1) We unconditionally write the new value calculated to the global
cache once the dependencies are resolved. This gives it a new
timestamp, and thus makes any other values which used it out of date
when they really aren't. This cause
This implements suggestion 3) from the PR:
3) When we first set the intial value for _1947 and give it the
ALWAYS_CURRENT timestamp, we lose the context of when the initial
value was set. So even with 1) & 2) implemented, we are *still*
need to set a timestamp for it when its finally
On Wed, 24 May 2023 at 12:41, Richard Biener wrote:
> On Wed, 24 May 2023, Christophe Lyon wrote:
>
> > Hi Richard,
> >
> > On Tue, 23 May 2023 at 11:55, Richard Biener via Gcc-patches <
> > gcc-patches@gcc.gnu.org> wrote:
> >
> > > The following fixes code hoisting to properly consider ANTIC_OUT
On Wed, 24 May 2023, Richard Sandiford wrote:
> Sorry, I realised later that I had an implicit assumption here:
> if there are multiple rgroups, it's better to have a single IV
> for the smallest rgroup and scale that up to bigger rgroups.
>
> E.g. if the loop control IV is taken from an N-contro
Explicitly say that bitwise shifts for narrow types work similar to
element-wise C shifts with integer promotions, which coincides with
OpenCL semantics.
gcc/ChangeLog:
* doc/extend.texi (Vector Extensions): Clarify bitwise shift
semantics.
---
gcc/doc/extend.texi | 7 ++-
1
Joseph,
Thanks a lot for the review. And sorry for my late reply (just came back from a
short vacation).
> On May 19, 2023, at 5:12 PM, Joseph Myers wrote:
>
> On Fri, 19 May 2023, Qing Zhao via Gcc-patches wrote:
>
>> +GCC extension accepts a structure containing an ISO C99 @dfn{flexible arr
On Wed, May 24, 2023 at 2:39 PM Eric Botcazou wrote:
>
> > I don't like littering the patterns with this and it's likely far from the
> > only cases we have?
>
> Maybe, but that's the only problematic case we have in Ada. It occurs only on
> mainline because we have streamlined address calculatio
On Wed, May 24, 2023 at 2:54 PM Alexander Monakov via Gcc-patches
wrote:
>
> Explicitly say that bitwise shifts for narrow types work similar to
> element-wise C shifts with integer promotions, which coincides with
> OpenCL semantics.
Do we need to clarify that v << w with v being a vector of sho
>> In other words, why is this different from what
>>vect_set_loop_controls_directly would do?
Oh, I see. You are confused that why I do not make multiple-rgroup vec_trunk
handling inside "vect_set_loop_controls_directly".
Well. Frankly, I just replicate the handling of ARM SVE:
unsigned int nmas
OK. Thanks. I am gonna refine the patch following Richard's idea and test it.
Thanks both Richard and Richi.
juzhe.zh...@rivai.ai
From: Richard Biener
Date: 2023-05-24 20:51
To: Richard Sandiford
CC: 钟居哲; gcc-patches
Subject: Re: [PATCH V12] VECT: Add decrement IV iteration loop control by
va
Hi all,
Continuing the series of straightforward annotations, this one handles the
normal (not widening or narrowing) vector shifts.
Tests included.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
Pushing to trunk.
Thanks,
Kyrill
gcc/ChangeLog:
PR target/9919
钟居哲 writes:
>>> In other words, why is this different from what
>>>vect_set_loop_controls_directly would do?
> Oh, I see. You are confused that why I do not make multiple-rgroup vec_trunk
> handling inside "vect_set_loop_controls_directly".
>
> Well. Frankly, I just replicate the handling of ARM
>> Both approaches are fine. I'm not against one or the other.
>> What I didn't understand was why your patch only reuses existing IVs
>> for max_nscalars_per_iter == 1. Was it to avoid having to do a
>> multiplication (well, really a shift left) when moving from one
>> rgroup to another? E.g.
Hello,
On Wed, May 17 2023, Aldy Hernandez wrote:
> This patch encapsulates the ipa_vr internals into an API. It also
> makes it type agnostic, in preparation for upcoming changes to IPA.
>
> Interestingly, there's a 0.44% improvement to IPA-cp, which I'm sure
> we'll soak up with future changes
>> Actually, I just want to hanlde multip-rgroup for non-SLP here, I am trying
>> to avoid multiplication and I think
>> scalar multiplication (not cost too much) is fine in modern CPU.
Sorry for incorrect typo. I didn't try to avoid multiplication and I think
multiplication is fine.
juzhe.zh.
Also, move vv8qi3 expander to a better place and enable
it with TARGET_MMX_WITH_SSE. Remove handling of V8QImode from
ix86_expand_vecop_qihi2 since all partial QI->HI vector modes expand
via ix86_expand_vecop_qihi_partial.
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_expand_vecop_qihi2)
On Wed, 24 May 2023, Richard Biener wrote:
> On Wed, May 24, 2023 at 2:54 PM Alexander Monakov via Gcc-patches
> wrote:
> >
> > Explicitly say that bitwise shifts for narrow types work similar to
> > element-wise C shifts with integer promotions, which coincides with
> > OpenCL semantics.
>
>
Bernhard,
Thanks a lot for your comments.
> On May 19, 2023, at 7:11 PM, Bernhard Reutner-Fischer
> wrote:
>
> On Fri, 19 May 2023 20:49:47 +
> Qing Zhao via Gcc-patches wrote:
>
>> GCC extension accepts the case when a struct with a flexible array member
>> is embedded into another stru
Oh. I just realize the follow you design is working well for vec_pack_trunk too.
Will send V13 patch soon.
Thanks.
juzhe.zh...@rivai.ai
From: 钟居哲
Date: 2023-05-24 22:10
To: richard.sandiford
CC: gcc-patches; rguenther
Subject: Re: Re: [PATCH V12] VECT: Add decrement IV iteration loop control
From: Ju-Zhe Zhong
This patch is supporting decrement IV by following the flow designed by Richard:
(1) In vect_set_loop_condition_partial_vectors, for the first iteration of:
call vect_set_loop_controls_directly.
(2) vect_set_loop_controls_directly calculates "step" as in your patch.
If rg
Hi. Richard. I have sent V13:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619475.html
It looks more reasonable now.
Could you continue review it again?
Thanks.
juzhe.zh...@rivai.ai
From: Richard Sandiford
Date: 2023-05-24 22:01
To: 钟居哲
CC: gcc-patches; rguenther
Subject: Re: [PATCH V12
> But nobody is going to understand why the INTEGER_CST case goes the
> other way.
I can add a fat comment to that effect of course. :-)
> As you say we don't have a good way to say we're doing
> this to avoid undefined behavior, but then a view-convert back would
> be a good way to indicate that
From: Ju-Zhe Zhong
This patch is supporting decrement IV by following the flow designed by Richard:
(1) In vect_set_loop_condition_partial_vectors, for the first iteration of:
call vect_set_loop_controls_directly.
(2) vect_set_loop_controls_directly calculates "step" as in your patch.
If rg
Forget about V13. Plz go directly review V14.
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619478.html
Thanks.
juzhe.zh...@rivai.ai
From: juzhe.zhong
Date: 2023-05-24 22:29
To: gcc-patches
CC: richard.sandiford; rguenther; Ju-Zhe Zhong
Subject: [PATCH V13] VECT: Add decrement IV iterat
On Wed, 2023-05-24 at 18:07 +0800, Lulu Cheng wrote:
>
> 在 2023/5/24 下午5:25, Xi Ruoyao 写道:
> > On Wed, 2023-05-24 at 16:47 +0800, Lulu Cheng wrote:
> > > 在 2023/5/24 下午2:45, Xi Ruoyao 写道:
> > > > On Wed, 2023-05-24 at 14:04 +0800, Lulu Cheng wrote:
> > > > > An empty struct type that is not non-tr
钟居哲 writes:
>>> Both approaches are fine. I'm not against one or the other.
>
>>> What I didn't understand was why your patch only reuses existing IVs
>>> for max_nscalars_per_iter == 1. Was it to avoid having to do a
>>> multiplication (well, really a shift left) when moving from one
>>> rgroup
Yeah. Thanks. I have sent V14:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619478.html
which I found there is no distinction between SLP and non-SLP.
Could you review it? I think it's more reasonable now.
Thanks.
juzhe.zh...@rivai.ai
From: Richard Sandiford
Date: 2023-05-24 22:57
To:
OK for master and backports? (also a long-standing bug that didn't surface
until the new constexpr test was added)
tested on powerpc64le-linux-gnu
- 8< -
Signed-off-by: Matthias Kretz
libstdc++-v3/ChangeLog:
PR libstdc++/109949
* include/experiment
Thanks for trying it. I'm still surprised that no multiplication
is needed though. Does the patch work for:
short x[100];
int y[200];
void f() {
for (int i = 0, j = 0; i < 100; i += 2, j += 4) {
x[i + 0] += 1;
x[i + 1] += 2;
y[j + 0] += 1;
y[j + 1] += 2;
y[j + 2] += 3;
Hi, the .optimized dump is like this:
[local count: 21045336]:
ivtmp.26_36 = (unsigned long) &x;
ivtmp.27_3 = (unsigned long) &y;
ivtmp.30_6 = (unsigned long) &MEM [(void *)&y + 16B];
ivtmp.31_10 = (unsigned long) &MEM [(void *)&y + 32B];
ivtmp.32_14 = (unsigned long) &MEM [(void *
On Wed, 2023-05-24 at 13:32 +0800, Kewen.Lin wrote:
> on 2023/5/24 06:30, Peter Bergner wrote:
> > On 5/23/23 12:24 AM, Kewen.Lin wrote:
> > > on 2023/5/23 01:31, Carl Love wrote:
> > > > The builtins were requested for use in GLibC. As of version
> > > > 2.31 they
> > > > were added as inline asm
钟居哲 writes:
> Hi, the .optimized dump is like this:
>
>[local count: 21045336]:
> ivtmp.26_36 = (unsigned long) &x;
> ivtmp.27_3 = (unsigned long) &y;
> ivtmp.30_6 = (unsigned long) &MEM [(void *)&y + 16B];
> ivtmp.31_10 = (unsigned long) &MEM [(void *)&y + 32B];
> ivtmp.32_14 = (u
Hi, Richard.
I think it can work after I analyze it.
Let's take a look the codes:
void f() {
for (int i = 0, j = 0; i < 100; i += 2, j += 4) {
x[i + 0] += 1;
x[i + 1] += 2;
y[j + 0] += 1;
y[j + 1] += 2;
y[j + 2] += 3;
y[j + 3] += 4;
}
}
For "x", each scalar iteration
Hi, Richard. I still don't understand it. Sorry about that.
>> loop_len_48 = MIN_EXPR ;
>> _74 = loop_len_34 * 2 - loop_len_48;
I have the tests already tested.
We have a MIN_EXPR to calculate the total elements:
loop_len_34 = MIN_EXPR ;
I think "8" is already multiplied by 2?
Why do we n
Am 24.05.23 um 11:38 schrieb Richard Biener:
On Tue, May 23, 2023 at 2:56 PM Georg-Johann Lay wrote:
PR target/104327 not only affects s390 but also avr:
The avr backend pre-sets some options depending on optimization level.
The inliner then thinks that always_inline functions are not eligi
钟居哲 writes:
> Hi, Richard. I still don't understand it. Sorry about that.
>
>>> loop_len_48 = MIN_EXPR ;
> >> _74 = loop_len_34 * 2 - loop_len_48;
>
> I have the tests already tested.
> We have a MIN_EXPR to calculate the total elements:
> loop_len_34 = MIN_EXPR ;
> I think "8" is already mul
Oh. I see. Thank you so much for pointing this.
Could you tell me what I should do in the codes?
It seems that I should adjust it in
vect_adjust_loop_lens_control
muliply by some factor ? Is this correct multiply by max_nscalars_per_iter
?
Thanks.
juzhe.zh...@rivai.ai
From: Richard Sandiford
钟居哲 writes:
> Oh. I see. Thank you so much for pointing this.
> Could you tell me what I should do in the codes?
> It seems that I should adjust it in
> vect_adjust_loop_lens_control
>
> muliply by some factor ? Is this correct multiply by max_nscalars_per_iter
> ?
max_nscalars_per_iter * factor
1 - 100 of 172 matches
Mail list logo