Hi,
For function with target attribute arch=*, current logic will set its
tune to -mtune from command line so all target_clones will get same
tuning flags which would affect the performance for each clone. Override
tune with arch if tune was not explicitly specified to get proper tuning
flags for
Hi,
For function with different target attributes, current logic rejects to
inline the callee when any arch or tune is mismatched. Relax the
condition to honor just prefer_vecotr_width_type and other flags that
may cause safety issue so caller can get more optimization opportunity.
Bootstrapped/r
Thanks, I'll backport it down to GCC10 after this passed all bootstrap/regtest.
Uros Bizjak via Gcc-patches 于2023年6月26日周一 14:05写道:
>
> On Mon, Jun 26, 2023 at 4:31 AM Hongyu Wang wrote:
> >
> > Hi,
> >
> > For function with target attribute arch=*, current logic will set its
> > tune to -mtune f
> I don't think this is desirable. If we inline something with different
> ISAs, we get some strange mix of ISAs when the function is inlined.
> OTOH - we already inline with mismatched tune flags if the function is
> marked with always_inline.
Previously ix86_can_inline_p has
if (((caller_opts->
The testcase fails with --with-arch=native build on cascadelake, here
is the patch to adjust it
gcc/testsuite/ChangeLog:
* gcc.target/i386/mvc17.c: Add -march=x86-64 to dg-options.
---
gcc/testsuite/gcc.target/i386/mvc17.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/t
> If the user specified a different arch for callee than the caller,
> then the compiler will switch on different ISAs (-march is just a
> shortcut for different ISA packs), and the programmer is aware that
> inlining isn't intended here (we have -mtune, which is not as strong
> as -march, but even
Hi,
For function with different target attributes, current logic rejects to
inline the callee when any arch or tune is mismatched. Relax the
condition to allow callee with default arch/tune to be inlined.
Boostrapped/regtested on x86-64-linux-gnu{-m32,}.
Ok for trunk?
gcc/ChangeLog:
*
> In a follow-up patch, can you please document inlining rules involving
> -march and -mtune to "x86 Function Attributes" section? Currently, the
> inlining rules at the end of "target function attribute" section does
> not even mention -march and -mtune. Maybe a subsubsection "Inlining
> rules" sh
Thanks, this is the updated patch I'm going to check in.
Uros Bizjak 于2023年7月4日周二 16:57写道:
>
> On Tue, Jul 4, 2023 at 10:32 AM Hongyu Wang wrote:
> >
> > > In a follow-up patch, can you please document inlining rules involving
> > > -march and -mtune to "x86 Function Attributes" section? Current
Hi,
This is a follow-up patch for
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623525.html
that updates document about x86 inlining rules.
Ok for trunk?
gcc/ChangeLog:
* doc/extend.texi: Move x86 inlining rule to a new subsubsection
and add description for inling of funct
Hi,
When OMP_WAIT_POLICY is not specified, current implementation will cause
icv flag GOMP_ICV_WAIT_POLICY unset, so global variable wait_policy
will remain its uninitialized value. Set it to -1 when the flag is not
specified to keep GOMP_SPINCOUNT behavior consistent with its description.
Bootst
> I think the right spot to fix this would be instead in initialize_icvs,
> change the
> icvs->wait_policy = 0;
> in there to
> icvs->wait_policy = -1;
> That way it will be the default for all the devices, not just the
> initial one.
It doesn't work, for the code that determines value of wait
Hongyu Wang 于2023年3月8日周三 16:07写道:
>
> > I think the right spot to fix this would be instead in initialize_icvs,
> > change the
> > icvs->wait_policy = 0;
> > in there to
> > icvs->wait_policy = -1;
> > That way it will be the default for all the devices, not just the
> > initial one.
>
> It do
> Seems for many ICVs the default values are done through
> gomp_default_icv_values, but that doesn't cover wait_policy.
> For other vars, the defaults are provided through just initializers of
> those vars on the var definitions, e.g.:
> char *gomp_affinity_format_var = "level %L thread %i affinit
Hi,
This patch fixes some typo in amxbf16-dpbf16ps-2 test.
Tested under sde/spr machine and passed.
OK for master and backport to GCC 11?
gcc/testsuite/ChangeLog:
* gcc.target/i386/amxbf16-dpbf16ps-2.c: Fix typos.
---
gcc/testsuite/gcc.target/i386/amxbf16-dpbf16ps-2.c | 6 +++---
1 fi
Hi,
For lea + zero_extendsidi insns, if dest of lea and src of zext are the
same, combine them with single leal under 64bit target since 32bit
register will be automatically zero-extended.
Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
Ok for master?
gcc/ChangeLog:
PR target/101
Sorry for the typo, scan-assembler should be
+/* { dg-final { scan-assembler "leal\[\\t \]\[^\\n\]*eax" } } */
+/* { dg-final { scan-assembler-not "movl\[\\t \]\[^\\n\]*eax" } } */
Hongyu Wang via Gcc-patches 于2021年8月13日周五 上午8:49写道:
>
> Hi,
>
> For lea + zero_
> So, the question is if the combine pass really needs to zero-extend
> with 0xfffe, the left shift << 1 guarantees zero in the LSB, so
> 0x should be better and in line with canonical zero-extension
> RTX.
The shift mask is generated in simplify_shift_const_1:
mask_rtx = gen_int_mode
Hi Uros,
Sorry for the late update. I have tried adjusting the combine pass but
found it is not easy to modify shift const, so I came up with an
alternative solution with your patch. It matches the non-canonical
zero-extend in ix86_decompose_address and adjust ix86_rtx_cost to
combine below patter
Tamar Christina 于2020年9月12日周六 上午1:39写道:
> Hi Martin,
>
> >
> > can you please confirm that the difference between these two is all due
> to
> > the last option -fno-inline-functions-called-once ? Is LTo necessary?
> I.e., can
> > you run the benchmark also built with the branch compiler and
> -m
> > new file mode 100644
> > > index 000..605a44df3f8
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c
> > > @@ -0,0 +1,4 @@
> > > +/* { dg-do assemble { target { ! ia32 } } } */
> > >
Hi:
This patch is about to support Intel Key Locker extension.
Key Locker provides a mechanism to encrypt and decrypt data with an AES key
without having access to the raw key value.
For more details, please refer to
https://software.intel.com/content/dam/develop/external/us/en/documents/343965-
Thanks! I'll ask my colleague to help check in the patch.
Kirill Yukhin 于2020年9月28日周一 下午7:38写道:
> Hello,
>
> On 12 сен 01:00, Hongyu Wang wrote:
> > Hi
> >
> > Thanks for your review, and sorry for the late reply. It took a while
> > to finish the runtime test.
>
> Thanks for your fixes! The pa
Hi,
Some x86 intrinsic headers is missing FSF copyright notes. This patch add
the missed notes for those headers.
OK for master?
gcc/ChangeLog:
* config/i386/amxbf16intrin.h: Add FSF copyright notes.
* config/i386/amxint8intrin.h: Ditto.
* config/i386/amxtileintrin.h: Ditto.
* config/i386/avx51
Thanks for the fix! I forgot that we don't have builtin check for
target-supports.exp.
Will update these once we implement AMX with builtins.
Jakub Jelinek 于2020年9月30日周三 下午7:51写道:
> On Fri, Sep 18, 2020 at 04:31:55PM +0800, Hongyu Wang via Gcc-patches
> wrote:
> > Very App
Use avx2-check mechanism to avoid illegal instrucion on non-avx2 target.
Tested by Rainer Orth on Solaris/x86. Pushed to trunk as obvious fix.
gcc/testsuite/ChangeLog:
PR target/104726
* gcc.target/i386/pr104551.c: Use avx2-check.h.
---
gcc/testsuite/gcc.target/i386/pr104551.c |
Use standard C type instead of __int64_t which doesn't work on Solaris.
Tested by Rainer Orth on Solaris/x86. Pushed to trunk as obvious fix.
gcc/testsuite/ChangeLog:
PR target/104724
* gcc.target/i386/avx512fp16-vcvtsi2sh-1b.c: Use long long
instead of __int64_t.
Hi,
This patch fixes typo in subst for scalar complex mask_round operand.
Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,} and sde.
Ok for master?
gcc/ChangeLog:
PR target/104977
* config/i386/sse.md
(avx512fp16_fmash_v8hf):
Correct round operand for intel
Hi,
For complex scalar intrinsic like _mm_mask_fcmadd_sch, the
mask should be and by 1 to ensure the mask is bind to lowest byte.
Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,} and sde.
Ok for master?
gcc/ChangeLog:
PR target/104978
* config/i386/sse.md
(avx512fp16
日周一 09:08写道:
>
> On Sat, Mar 19, 2022 at 8:09 AM Hongyu Wang via Gcc-patches
> wrote:
> >
> > Hi,
> >
> > For complex scalar intrinsic like _mm_mask_fcmadd_sch, the
> > mask should be and by 1 to ensure the mask is bind to lowest byte.
> >
> &g
> >
> > Hongtao Liu via Gcc-patches 于2022年3月21日周一 09:08写道:
> > >
> > > On Sat, Mar 19, 2022 at 8:09 AM Hongyu Wang via Gcc-patches
> > > wrote:
> > > >
> > > > Hi,
> > > >
> > > > For complex scalar intrinsic lik
Hi,
For complex scalar intrinsic like _mm_mask_fcmadd_sch, the
mask should be and by 1 to ensure the mask is bind to lowest byte.
Use masked vmovss to perform same operation which omits higher bits
of mask.
Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,} and sde.
Ok for master?
gcc/ChangeLo
here are strictly V8HF operands from builtin input.
I suppose there should be no chance to input a different size subreg
for the expander, otherwise (__v8hf) convert in builtin would fail
first.
Hongtao Liu via Gcc-patches 于2022年3月21日周一 20:53写道:
>
> On Mon, Mar 21, 2022 at 7:52 PM Hongyu Wan
Hi, here is the patch with force_reg before lowpart_subreg.
Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,} and sde.
Ok for master?
For complex scalar intrinsic like _mm_mask_fcmadd_sch, the
mask should be and by 1 to ensure the mask is bind to lowest byte.
Use masked vmovss to perform same
Is it possible to create a test case that gas would throw an error for
invalid operands?
H.J. Lu via Gcc-patches 于2022年3月26日周六 04:50写道:
>
> Since KL instructions have no AVX512 version, replace the "v" register
> constraint with the "x" register constraint.
>
> PR target/105058
>
> > Is it possible to create a test case that gas would throw an error for
> > invalid operands?
>
> You can use -ffix-xmmN to disable XMM0-15.
I mean can we create an intrinsic test for this PR that produces xmm16-31?
And the -ffix-xmmN is an option for assembler or compiler? I didn't
find it in
Hi,
For -mrelax-cmpxchg-loop which relaxes atomic_fetch_ loops,
there is a missing set to %eax when compare fails, which would result
in infinite loop in some benchmark. Add set to %eax to avoid it.
Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,}
Ok for master?
gcc/ChangeLog:
PR ta
Hi,
>From -Os point of view, stv converts scalar register to vector mode
which introduces extra reg conversion and increase instruction size.
Disabling stv under optimize_size would avoid such code size increment
and no need to touch ix86_size_cost that has not been tuned for long
time.
Bootstrap
(a) : (b))
+#define min(a,b) (((a) < (b))? (a) : (b))
+
+int foo(int x)
+{
+ return max(x,0);
+}
+
+int bar(int x)
+{
+ return min(x,0);
+}
+
+unsigned int baz(unsigned int x)
+{
+ return min(x,1);
+}
+
+/* { dg-final { scan-assembler-not "xmm" } } */
--
2.18.1
Richard Biener via Gc
000..d997e26e9ed
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr105034.c
> > @@ -0,0 +1,23 @@
> > +/* PR target/105034 */
> > +/* { dg-do compile } */
> > +/* { dg-options "-Os -msse4.1" } */
> > +
> > +#define max(a,b) (((a) > (b))? (a
mp; optimize > 1);
> > > > + && TARGET_STV && TARGET_SSE2 && optimize > 1
> > > > + && optimize_function_for_speed_p (cfun));
> > >
> > > ... and use it here instead of referencing 'cfun'
> >
Hi,
Complile _mm_crc32_u8/16/32/64 intrinsics with -mcrc32
would meet target specific option mismatch. Correct target pragma
to fix.
Bootstrapped/regtest on x86_64-pc-linux-gnu{-m32,}.
Ok for master and backport to GCC 11?
gcc/ChangeLog:
* config/i386/smmintrin.h: Correct target pragma
> This test should not be changed, it correctly reports ISA mismatch. It
> even passes -mno-crc32.
The error message changes from "needs isa option -mcrc32" to "target
specific option mismatch" with the #pragma change.
I see many of our intrinsic would throw such error, it has been a long
term iss
Hi,
Add missing macro under O0 and adjust macro format for scalf
intrinsics.
Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for master and backport to GCC 9/10/11?
gcc/ChangeLog:
PR target/105339
* config/i386/avx512fintrin.h (_mm512_scalef_round_pd):
Add pare
> Please add the corresponding intrinsic test in sse-14.c
Sorry for forgetting this part. Updated patch. Thanks.
Hongtao Liu via Gcc-patches 于2022年4月22日周五 16:49写道:
>
> On Fri, Apr 22, 2022 at 4:12 PM Hongyu Wang via Gcc-patches
> wrote:
> >
> > Hi,
> >
> > A
Hi,
For cmpxchg, it is commonly used in spin loop, and several user code
such as pthread directly takes cmpxchg as loop condition, which cause
huge cache bouncing.
This patch extends previous implementation to relax all cmpxchg
instruction under -mrelax-cmpxchg-loop with an extra atomic load,
com
Hi,
This patch intends to sync with llvm change in
https://reviews.llvm.org/D120307 to add enumeration and truncate
imm to unsigned char, so users could use ~ on immediates.
Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for master?
gcc/ChangeLog:
* config/i386/avx512fintrin.h
Hi,
For V8HFmode vector init with HFmode, do not directly emits V8HF move
with subreg, which may cause reload to assign general register to move
src.
Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for master?
gcc/ChangeLog:
PR target/104664
* config/i386/i386-expand.cc
Hi Uros,
For -mrelax-cmpxchg-loop introduced by PR 103069/r12-5265, it would
produce infinite loop. The correct code should be
.L84:
movl(%rdi), %ecx
movl%eax, %edx
orl %esi, %edx
cmpl%eax, %ecx
jne .L82
lock cmpxchgl %edx, (%r
Hi,
Here is the update patch that align the implementation to AVX-VNNI,
and corrects some spelling error for AVX512IFMA pattern.
Bootstrapped/regtested on x86_64-pc-linux-gnu and sde. Ok for trunk?
gcc/
* common/config/i386/i386-common.cc
(OPTION_MASK_ISA_AVXIFMA_SET, OPTION_MAS
Hi,
Inspired by rs6000 and s390 port changes, this patch
enables loop unrolling for small size loop at O2 by default.
The default behavior is to unroll loop with unknown trip-count and
less than 4 insns by 1 time.
This improves 548.exchange2 by 3.5% on icelake and 6% on zen3 with
1.2% codesize in
> Does this setting benefit all targets? IIRC, in the past all
> benchmarks also enabled -funroll-loops, so it looks to me that
> unrolling small loops by default is a good compromise.
The idea to unroll small loops can be explained from the x86
micro-architecture. Modern x86 processors has multi
> Ugh, that's all quite ugly and unmaintainable, no?
Agreed, I have the same feeling.
> I'm quite sure that if this works it's not by intention. Doesn't this
> also disable
> register renaming and web when the user explicitely specifies -funroll-loops?
>
> Doesn't this change -funroll-loops behav
Hi, this is the updated patch of
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604345.html,
which uses targetm.loop_unroll_adjust as gate to enable small loop unroll.
This patch does not change rs6000/s390 since I don't have machine to
test them, but I suppose the default behavior is the
> +(define_split
> + [(set (reg:CCZ FLAGS_REG)
> + (compare:CCZ (unspec:SI
> + [(eq:VI1_AVX2
> + (match_operand:VI1_AVX2 0 "vector_operand")
> + (match_operand:VI1_AVX2 1 "const0_operand"))]
> + UNSPE
> I don't think *_os_support calls should be removed. IIRC,
> __builtin_cpu_supports function checks if the feature is supported by
> CPU, whereas *_os_supports calls check via xgetbv if OS supports
> handling of new registers.
avx_os_support is like
avx_os_support (void)
{
unsigned int eax, ed
Hi,
This patch adds a constraint "Ws" to allow absolute symbolic address for either
function or variable. This also works under -mcmodel=large.
Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,}
Ok for master?
gcc/ChangeLog:
PR target/105576
* config/i386/constraints.md (Ws):
Oh, I just found that asm ("%p0" :: "i"(addr)); also works on
-mcmodel=large in this case, please ignore this patch. Thanks.
Uros Bizjak via Gcc-patches 于2022年5月18日周三 17:46写道:
>
> On Wed, May 18, 2022 at 9:32 AM Hongyu Wang wrote:
> >
> > Hi,
> >
> > This patch adds a constraint "Ws" to allow ab
> -fpic will break compilation with "i" constraint.
Ah, yes. But "X" is like no constraint, shouldn't we provide something
similar to "S" in aarch64 and riscv?
I think it is better to constrain the operand to constant symbols
rather than allowing everything.
Uros Bizjak 于2022年5月18日周三 18:18写道:
>
Hi
According to the discussion in
https://gcc.gnu.org/pipermail/gcc/2020-November/234096.html,
The testcase for keylocker-* is too strict for darwin target. This
patch adjusted the regex, and add a missing test for aesenc256kl
instruction.
Tested by Iain Sandone and all get pass in darwin target.
>
> Please rewrite scan strings back to using double-quotation marks.
>
Yes, updated patch.
Uros Bizjak 于2020年11月9日周一 下午7:41写道:
>
> On Mon, Nov 9, 2020 at 11:50 AM Hongyu Wang wrote:
> >
> > Hi
> >
> > According to the discussion in
> > https://gcc.gnu.org/pipermail/gcc/2020-November/234096.ht
Hi,
According to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97770, x86
backend need popcount2 expander so __builtin_popcount could be
auto vectorized with AVX512BITALG/AVX512VPOPCNTDQ targets.
For DImode the middle-end vectorizer could not generate expected code,
and for QI/HImode there is no c
Hi
Thanks for reminding me about this patch. I didn't remove any existing
intrinsics, just remove redundant builtin functions that end-users
would not likely to use.
Also I'm OK to keep current implementation, in case there might be
someone using the builtin directly.
Jeff Law 于2020年11月13日周五 下午
Hi,
This patch extend the expanders for cond_op to support vector HF modes.
bootstraped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for master?
gcc/ChangeLog:
* config/i386/sse.md (cond_): Extend to support
vector HFmodes.
(cond_mul): Likewise.
(cond_div): Lik
> >This patch extend the expanders for cond_op to support vector HF modes.
> >bootstraped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Do runtime tests passe on sde{-m32,}?
Yes, forgot to mention this.
Liu, Hongtao via Gcc-patches 于2021年9月23日周四 下午5:31写道:
>
>
>
> >-Original Message-
>
Hi Uros,
This patch intends to support V4HF/V2HF vector type and basic operations.
For 32bit target, V4HF vector is parsed same as __m64 type, V2HF
is parsed by stack and returned from GPR since it is not specified
by ABI.
We found for 64bit vector in ia32, when mmx disabled there seems no
mov_i
> ia32 ABI declares that __m64 values pass via MMX registers. Due to
> this, we are not able to fully disable MMX register usage, as is the
> case with x86_64. So, V4HFmode values will pass to functions via MMX
> registers on ia32 targets.
>
> So, there should be no additional define_insn, the addi
> I'd put this new pattern in mmx.md to keep 64bit/32bit modes in
> mmx.md, similar to e.g. FMA patterns among others.
Yes, I put it after single-float patterns. Attached the patch I'm
going to check-in.
Thanks for your review.
Uros Bizjak 于2021年9月28日周二 下午2:27写道:
>
> On Tue, Sep 28, 2021 at 6:48
Hi,
_tile_loadd, _tile_stored, _tile_streamloadd intrinsics are defined by
macro, so the parameters should be wrapped by parentheses to accept
expressions.
Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,} and sde.
OK for master and backport to GCC11 branch?
gcc/ChangeLog:
* config/i
Hi,
AVX512VNNI/AVXVNNI has vpdpwssd for HImode, vpdpbusd for QImode, so
Adjust HImode sdot_prod expander and add QImode usdot_prod expander
to enhance vectorization for dotprod.
Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,} and sde.
Ok for master?
gcc/ChangeLog:
* config/i386/sse.
> Could you add a testcase for that?
Yes, updated patch.
Hongtao Liu via Gcc-patches 于2021年11月4日周四 上午10:25写道:
>
> On Thu, Nov 4, 2021 at 9:19 AM Hongyu Wang via Gcc-patches
> wrote:
> >
> > Hi,
> >
> > _tile_loadd, _tile_stored, _tile_streamloadd intrins
Hi,
>From the CPU's point of view, getting a cache line for writing is more
expensive than reading. See Appendix A.2 Spinlock in:
https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/
xeon-lock-scaling-analysis-paper.pdf
The full compare and swap will grab the cache line ex
Hi,
>From the CPU's point of view, getting a cache line for writing is more
expensive than reading. See Appendix A.2 Spinlock in:
https://www.intel.com/content/dam/www/public/us/en/documents/white-papers
/xeon-lock-scaling-analysis-paper.pdf
The full compare and swap will grab the cache line e
Thanks for your review, this is the patch I'm going to check-in.
Uros Bizjak via Gcc-patches 于2021年11月15日周一 下午4:25写道:
>
> On Sat, Nov 13, 2021 at 3:34 AM Hongyu Wang wrote:
> >
> > Hi,
> >
> > From the CPU's point of view, getting a cache line for writing is more
> > expensive than reading. See
Hi,
Current mask/mask3 implementation for complex fma contains
duplicated parameter in macro, which may cause error at -O0.
Refactor macro implementation to builtins to avoid potential
error.
For round intrinsic with NO_ROUND as input, ix86_erase_embedded_rounding
erases embedded_rounding upspec
Hi,
This patch supports HFmode vector shuffle by creating HImode subreg when
expanding permutation expr.
Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,} and sde{-m32,}
OK for master?
gcc/ChangeLog:
* config/i386/i386-expand.c (ix86_expand_vec_perm): Convert
HFmode input ope
Hi,
-march=cascadelake which contains -mavx512vl produces unmatched scan
for vf[c]maddcsh test, so add -mno-avx512vl to vf[c]maddcsh-1a.c.
Also add scan for vblendmps to vf[c]maddcph tests to check correctness.
Tested on unix{-m32,} with -march=cascadelake.
Pushed to trunk as obvious fix.
gcc/
Hi,
For V4HFmode, doing vector concat like
__builtin_shufflevector (a, b, {0, 1, 2, 3, 4, 5, 6, 7})
could trigger ICE since it is not handled in ix86_vector_init ().
Handle HFmode like HImode to avoid such ICE.
Bootstrappted/regtested on x86_64-pc-linux-gnu{-m32,} and sde{-m32,}
OK for master
> This part seems not related to vector shuffle.
Yes, have separated this part to another patch and checked-in.
Updated patch. Ok for this one?
Hongtao Liu via Gcc-patches 于2021年10月14日周四 下午2:33写道:
>
> On Thu, Oct 14, 2021 at 10:39 AM Hongyu Wang via Gcc-patches
> wrote
have separated this part to another patch and checked-in.
> >
> > Updated patch. Ok for this one?
> >
> > Hongtao Liu via Gcc-patches 于2021年10月14日周四
> > 下午2:33写道:
> > >
> > > On Thu, Oct 14, 2021 at 10:39 AM Hongyu Wang via Gcc-patches
> > > w
Since _Float16 type is enabled under sse2 target, returning
V8HFmode vector without AVX512F target would generate wrong
vmovdqa64 instruction. Adjust ix86_get_ssemov to avoid this.
Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,} and sde.
OK for master?
gcc/ChangeLog:
PR target/102812
__vector_size__ (16)));
+
+v8hf t (_Float16 a)
+{
+return (v8hf) {a, 0, 0, 0, 0, 0, 0, 0};
+}
--
2.18.1
Hongtao Liu via Gcc-patches 于2021年10月21日周四 下午1:24写道:
>
> On Wed, Oct 20, 2021 at 1:31 PM Hongyu Wang via Gcc-patches
> wrote:
> >
> > Since _Float16 type is ena
Thanks for reminding this, will adjust the testcase since the output
for 128/256bit HFmode load has changed.
Martin Liška 于2021年10月21日周四 下午8:49写道:
>
> On 10/21/21 07:47, Hongyu Wang via Gcc-patches wrote:
> > |Yes, updated patch.|
>
> Note the patch caused the following test
Hi,
The HF vector move have been updated to align with HI vector,
adjust according testcase for _Float16 vector load and store.
Tested on x86_64-pc-linux-gnu{-m32,}, pushed as obvious fix.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512fp16-13.c: Adjust scan-assembler for
xmm/
I think this can be put in as an obvious fix.
Thanks for the patch.
Rainer Orth 于2021年10月25日周一 下午9:53写道:
>
> The gcc.target/i386/avx512fp16-trunchf.c test FAILs on 32-bit Solaris/x86:
>
> FAIL: gcc.target/i386/avx512fp16-trunchf.c scan-assembler-times vcvttsh2si[
> t]+[^{\\n]*(?:%xmm[0-9]|\
Hi,
For _Float16 type, add insn and expanders to optimize x / y to
x * rcp (y), and x / sqrt (y) to x * rsqrt (y).
As Half float only have minor precision difference between div and
mul * rcp, there is no need for Newton-Rhapson approximation.
Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,}
If there is no objection, I'm going to backport the m_SAPPHIRERAPIDS
and m_ALDERLAKE change to GCC 12.
Uros Bizjak via Gcc-patches 于2022年12月7日周三 15:11写道:
>
> On Wed, Dec 7, 2022 at 7:36 AM Hongyu Wang wrote:
> >
> > For Alderlake there is similar issue like PR 81616, enable
> > avoid_fma256_chai
Hi,
For PR27, the wrong code was caused by wrong expander for maskz.
correct the parameter order for avx512ne2ps2bf16_maskz expander
Bootstrapped/regtested on x86-64-pc-linux-gnu{m32,}.
OK for master and backport to GCC13?
gcc/ChangeLog:
PR target/27
* config/i386/sse.
Like base_reg_class, INDEX_REG_CLASS also does not support backend insn.
Add index_reg_class with insn argument for lra/reload usage.
gcc/ChangeLog:
* addresses.h (index_reg_class): New wrapper function like
base_reg_class.
* doc/tm.texi: Document INSN_INDEX_REG_CLASS.
Intel Advanced performance extension (APX) has been released in [1].
It contains several extensions such as extended 16 general purpose registers
(EGPRs), push2/pop2, new data destination (NDD), conditional compare
(CCMP/CTEST) combined with suppress flags write version of common instructions
(NF).
From: Kong Lingling
In inline asm, we do not know if the insn can use EGPR, so disable EGPR
usage by default from mapping the common reg/mem constraint to non-EGPR
constraints. Use a flag mapx-inline-asm-use-gpr32 to enable EGPR usage
for inline asm.
gcc/ChangeLog:
* config/i386/i386.cc
From: Kong Lingling
Add -mapx-features= enumeration to separate subfeatures of APX_F.
-mapxf is treated same as previous ISA flag, while it sets
-mapx-features=apx_all that enables all subfeatures.
gcc/ChangeLog:
* common/config/i386/cpuinfo.h (XSTATE_APX_F): New macro.
(XCR_APX
For vector move insns like vmovdqa/vmovdqu, their evex counterparts
requrire explicit suffix 64/32/16/8. The usage of these instruction
are prohibited under AVX10_1 or AVX512F, so for AVX2+APX_F we select
vmovaps/vmovups for vector load/store insns that contains EGPR.
gcc/ChangeLog:
* con
From: Kong Lingling
Current reload infrastructure does not support selective base_reg_class
for backend insn. Add insn argument to base_reg_class for
lra/reload usage.
gcc/ChangeLog:
* addresses.h (base_reg_class): Add insn argument.
Pass to MODE_CODE_BASE_REG_CLASS.
(r
From: Kong Lingling
For APX, as we extended the GENERAL_REG_CLASS, new constraints are
needed to restrict insns that cannot adopt EGPR either in its reg or
memory operands.
gcc/ChangeLog:
* config/i386/constraints.md (h): New register constraint
for GENERAL_GPR16.
(Bt):
From: Kong Lingling
Add backend helper functions to verify if a rtx_insn can adopt EGPR to
its base/index reg of memory operand. The verification rule goes like
1. For asm insn, enable/disable EGPR by ix86_apx_inline_asm_use_gpr32.
2. Disable EGPR for unrecognized insn.
3. If which_alternat
From: Kong Lingling
Disable EGPR usage for below legacy insns in opcode map2/3 that have vex
but no evex counterpart.
insn list:
1. phminposuw/vphminposuw
2. ptest/vptest
3. roundps/vroundps, roundpd/vroundpd,
roundss/vroundss, roundsd/vroundsd
4. pcmpestri/vpcmpestri, pcmpestrm/vpcmpestrm
5.
From: Kong Lingling
Extend GENERAL_REGS with extra r16-r31 registers like REX registers,
named as REX2 registers. They will only be enabled under
TARGET_APX_EGPR.
gcc/ChangeLog:
* config/i386/i386-protos.h (x86_extended_rex2reg_mentioned_p):
New function prototype.
* con
From: Kong Lingling
These legacy insn in opcode map0/1 only support GPR16,
and do not have vex/evex counterpart, directly adjust constraints and
add gpr32 attr to patterns.
insn list:
1. xsave/xsave64, xrstor/xrstor64
2. xsaves/xsaves64, xrstors/xrstors64
3. xsavec/xsavec64
4. xsaveopt/xsaveopt6
From: Kong Lingling
These legacy insns in opcode map2/3 have vex but no evex
counterpart, disable EGPR for them by adjusting alternatives and
attr_gpr32.
insn list:
1. phaddw/vphaddw, phaddd/vphaddd, phaddsw/vphaddsw
2. phsubw/vphsubw, phsubd/vphsubd, phsubsw/vphsubsw
3. psignb/vpsginb, psignw/v
1 - 100 of 168 matches
Mail list logo