/other/i386-3.C: Likewise.
--
Regards,
Hongyu, Wang
From 9dbb6bfb28431cd52149e12cc5f359be7fb46c64 Mon Sep 17 00:00:00 2001
From: Hongyu Wang
Date: Tue, 7 Apr 2020 18:39:53 +
Subject: [PATCH] Enable Intel HRESET Instruction
gcc/
* common/config/i386/cpuinfo.h (get_available_features
://software.intel.com/content/dam/develop/external/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf
Bootstrap ok, regression test on i386/x86 backend is ok.
OK for master?
2020-10-13 Hongtao Liu
Hongyu Wang
gcc/
* common/config/i386/cpuinfo.h
et
passed.
And there is no intrinsic/builtin with const int parameter. So we remove
-muintr from these files.
Uros Bizjak 于2020年10月14日周三 下午2:18写道:
> On Tue, Oct 13, 2020 at 10:30 AM Hongyu Wang
> wrote:
> >
> > Hi:
> >
> > This patch is about to support User
Uros Bizjak 于2020年10月14日周三 下午4:42写道:
> On Wed, Oct 14, 2020 at 10:34 AM Hongyu Wang
> wrote:
> >
> > >
> > > Please also add -muintr to g++.dg/other/i386-{2,3}.C and
> > > gcc.target/i386-sse-{12,13,14,22,23}.c. This will test new intrinsics
> > &
Uros Bizjak 于2020年10月14日周三 下午4:53写道:
>
> On Wed, Oct 14, 2020 at 10:42 AM Uros Bizjak wrote:
> >
> > On Wed, Oct 14, 2020 at 10:34 AM Hongyu Wang wrote:
> > >
> > > >
> > > > Please also add -muintr to g++.dg/other/i386-{2,3}.C and
> > &
Uros Bizjak 于2020年10月14日周三 下午7:19写道:
>
> > > Please also add -muintr to g++.dg/other/i386-{2,3}.C and
> > > >> > > gcc.target/i386-sse-{12,13,14,22,23}.c. This will test new
intrinsics
> > > >> > > header.
> > > >> > >
> > > >> >
> > > >> > Thanks for your review. We found that without adding -mui
help.
H.J. Lu 于2020年10月14日周三 下午9:35写道:
>
> On Wed, Oct 14, 2020 at 6:31 AM Hongyu Wang via Gcc-patches
> wrote:
> >
> > Uros Bizjak 于2020年10月14日周三 下午7:19写道:
> > >
> > > > > Please also add -muintr to g++.dg/other/i386-{2,3}.C and
> > > > &
>
> The patch doesn't include all testsuite changes.
>
Yes, I update -mhreset in x86gprintrin-{1,2,3,4,5}.c
We will check-in the attached patch. Thanks.
Uros Bizjak 于2020年10月14日周三 下午2:26写道:
>
> On Tue, Oct 13, 2020 at 10:49 AM Hongyu Wang
wrote:
> >
> > Hi
Hi,
> IIRC, adding a new regclass is O(n^2), so it should be avoided. I
> think that the new patterns should follow the same path as vzeroall
> and vzeroupper patterns, where we emit the pattern with explicit hard
> regs.
>
> BTW: We do have SSE_FIRST_REG class, but this class was added to solve
>
Hongyu Wang 于2020年10月14日周三 上午11:27写道:
>
> Hi:
>
> This patch is about to support Intel AVX-VNNI instructions.
>
> AVX-VNNI is an equivalent to AVX512-VNNI with VEX encoding. The instructions
> are same, but with extra {vex} prefix to distinguish from AVX512-VNNI
> i
Oct 21, 2020 at 1:48 PM Uros Bizjak wrote:
> >
> > On Wed, Oct 21, 2020 at 11:11 AM Hongyu Wang wrote:
> > >
> > > Hi,
> > >
> > > > IIRC, adding a new regclass is O(n^2), so it should be avoided. I
> > > > think that the new patter
c.
Thanks for all the the helpful comments. Updated patch.
Hongtao Liu 于2020年10月29日周四 上午9:53写道:
>
> On Wed, Oct 28, 2020 at 8:24 PM Uros Bizjak wrote:
> >
> > On Wed, Oct 28, 2020 at 10:54 AM Hongyu Wang wrote:
> > >
> > > Hi Uros,
> > >
> &g
Thanks for your review! I'll ask Hongtao to check-in the patch for me.
Uros Bizjak 于2020年10月29日周四 下午4:08写道:
>
> On Thu, Oct 29, 2020 at 7:52 AM Hongyu Wang wrote:
> >
> > Hi Uros,
> >
> > > is there a reason to introduce all these (with corresponding c
Hi,
This is a follow-up patch for PR98167
The sequence
c1 = VEC_PERM_EXPR (a, a, mask)
c2 = VEC_PERM_EXPR (b, b, mask)
c3 = c1 op c2
can be optimized to
c = a op b
c3 = VEC_PERM_EXPR (c, c, mask)
for all integer vector operation, and float operation with
full permutation.
tunings instead of 1.
Yes, here is the updated patch that changes the cost table.
Bootstrapped & regrtested on x86_64-pc-linux-gnu.
Ok for trunk?
Hongtao Liu via Gcc-patches 于2022年11月8日周二 11:05写道:
>
> On Mon, Nov 7, 2022 at 10:25 PM Richard Biener via Gcc-patches
> wrote:
> >
&
c-patches 于2022年11月8日周二 22:38写道:
>
> On Fri, Nov 4, 2022 at 7:44 AM Prathamesh Kulkarni via Gcc-patches
> wrote:
> >
> > On Fri, 4 Nov 2022 at 05:36, Hongyu Wang via Gcc-patches
> > wrote:
> > >
> > > Hi,
> > >
> > > This is a foll
xample! We also tried using wide_int as a bitmask
but your code looks more simple and reasonable.
Updated the patch accordingly.
Richard Biener 于2022年11月10日周四 16:56写道:
>
> On Thu, Nov 10, 2022 at 3:27 AM Hongyu Wang wrote:
> >
> > Hi Prathamesh and Richard,
> >
> >
Thanks for the notification! I’m not aware of the compile farm before. Will see
what’s the impact of my patch then.
Regards,
Hongyu, Wang
From: David Edelsohn
Sent: Thursday, November 10, 2022 1:22 AM
To: Wang, Hongyu
Cc: GCC Patches
Subject: Re: [PATCH V2] Enable small loop unrolling for O2
> Ok, Note GCC documents have been ported to sphinx, so you need to
> adjust changes in invoke.texi to new sphinx files.
Yes, this is the patch I'm going to check-in. Thanks.
Hongtao Liu 于2022年11月14日周一 09:35写道:
>
> On Wed, Nov 9, 2022 at 9:29 AM Hongyu Wang wrote:
&
Hi,
According to PR 107676, the document of -mrelax-cmpxchg-loop is nonsensical.
Adjust the wording according to the comments.
Bootstrapped on x86_64-pc-linux-gnu, ok for trunk?
gcc/ChangeLog:
PR target/107676
* doc/invoke.texi: Reword the description of
-mrelax-cmpxchg-
issuing an atomic load before the
> @code{CMPXCHG} instruction, and using the @code{PAUSE} instruction
> to save CPU power when restarting the loop.
>
> Alexander
From e82f3e03115480ac3d055819658a107249932c65 Mon Sep 17 00:00:00 2001
From: Hongyu Wang
Date: Tue, 15 Nov 2022 11:16:17 +0800
Subje
> Please use 'git commit --author' to indicate authorship of the patch
> (or simply let me push it once approved).
Yes, just change the author and push it.
Thanks for your help!
Hi,
r13-3950-g071e428c24ee8c enables O2 small loop unrolling, but it breaks
-fno-unroll-loops for rs6000 with loop_unroll_adjust hook. Adjust the
option handling and target hook accordingly.
Bootstrapped & regtested on powerpc64le-linux-gnu, OK for trunk?
gcc/ChangeLog:
PR target/107692
> I assume the "full permutation" condition is to avoid performing some
> extra operations that would raise exception flags. If so, are there
> conditions (-fno-trapping-math?) where the transformation would be safe
> with arbitrary shuffles?
Yes, that could be an alternative choice with -fno-trap
According to the official guide, please sort your last name in
alphabetical order, which means you shold put your name between
Dave Korn
Julia Koval
Kong, Lingling via Gcc-patches 于2022年6月27日周一 16:05写道:
>
> Hi,
>
> I want to add myself in MAINTANINER for write after approval.
>
> OK for maste
Sorry, should be between
Boris Kolpackov
Dave Korn
Hongyu Wang 于2022年6月27日周一 16:29写道:
>
> According to the official guide, please sort your last name in
> alphabetical order, which means you shold put your name between
>
> Dave Korn
> Julia Koval
>
> Kong, Lingling v
of increasing CLEAR_RATIO on
>
> Hongyue, please collect code size differences on SPEC CPU 2017 and
> eembc.
>
> > SPEC/eembc? Did you play with other values of MOVE/CLEAR_RATIO?
> > 17 memory-to-memory/memory-clear insns looks quite a lot.
> >
>
> Yes, we did. 256 bytes is the threshold above which memcpy/memset in libc
> win. Below 256 bytes, 16 by_pieces move/store is faster.
>
> --
> H.J.
--
Regards,
Hongyu, Wang
and using short sequence
> > for those would be nice.
> >
> > Having minsize non-trivial may not be that uncommon these days either
> > given that we track value ranges (and under assumption that
> > memcpy/memset expanders was updated to take these into account).
> >
>
> Hongyu has done some analysis on this. Hongyu, can you share what
> you got?
>
> Thanks.
>
> --
> H.J.
--
Regards,
Hongyu, Wang
> Do you know what of the three changes (preferring reps/stosb,
> CLEAR_RATIO and algorithm choice changes) cause the two speedups
> on eebmc?
A extracted testcase from nnet_test in https://godbolt.org/z/c8KdsohTP
This loop is transformed to builtin_memcpy and builtin_memset with size 280.
Curre
For Keylocker aesenc/aesdec intrinsics, current implementation
moves idata to odata unconditionally, which causes safety issue when
the instruction meets runtime error. So we add a branch to clear
odata when ZF is set after instruction exectution.
gcc/ChangeLog:
* config/i386/i386-expand.
gt; On Thu, Jul 1, 2021 at 3:51 PM Hongyu Wang wrote:
> >
> > For Keylocker aesenc/aesdec intrinsics, current implementation
> > moves idata to odata unconditionally, which causes safety issue when
> > the instruction meets runtime error. So we add a branch to clear
&
Updated patch with minor change to move the variable declaration after comment.
Hongtao, could you help check-in the patch?
Hongyu Wang 于2021年7月1日周四 下午4:16写道:
>
> > Change some keylocker insn to Keylocker aesenc/aesdec in comments.
> > others LGTM.
>
> Changed.
This caused
XPASS: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqw 2
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdw 1
XPASS: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdw 2
liuhongt via Gcc-patches 于2021年7月1日周四 下午3:45写道:
>
> Bootstrapped
Hi,
For instructions like cvtss2si, there is no need to output the 'l'
or 'q' suffixes just like cvtss2usi, since the output operand is always
register and those suffixes are only used to distinguish ambiguous
memory operands.
Bootstraped and regression tested on x86_64-linux-gnu {,-m32}.
OK for
>
> On Fri, Jul 2, 2021 at 10:30 AM Hongyu Wang wrote:
> >
> > Hi,
> >
> > For instructions like cvtss2si, there is no need to output the 'l'
> > or 'q' suffixes just like cvtss2usi, since the output operand is always
> > regi
Uros Bizjak 于2021年7月2日周五 下午7:07写道:
>
> On Fri, Jul 2, 2021 at 12:48 PM Hongyu Wang wrote:
> >
> > >
> > > On Fri, Jul 2, 2021 at 10:30 AM Hongyu Wang wrote:
> > > >
> > > > Hi,
> > > >
> > > > For instructions like
Hi,
Followed by the discussion in pr107602, -munroll-only-small-loops
Does not turns on/off -funroll-loops, and current check in
pass_rtl_unroll_loops::gate would cause -funroll-loops do not take
effect. Revert the change about targetm.loop_unroll_adjust and apply
the backend option change to stri
1 AM Liu, Hongtao via Gcc-patches
> wrote:
> >
> >
> >
> > > -Original Message-
> > > From: Wang, Hongyu
> > > Sent: Saturday, November 19, 2022 2:26 PM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: richard.guent...@gmail.com; ub
he middle-end part with target hook looks quite tricky (and of
course the OPTION_SET_P in the target hook). So Richard if you agree,
I'd like to install the reversion patch posted in
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606774.html
and move all them to the backend first.
--
Regards,
Hongyu, Wang
a patch in
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606478.html
to change the rs6000 target and there is a discussion ongoing. If we
draw a conclusion that we only want to make changes in the backend, I
will install this one.
--
Regards,
Hongyu, Wang
For Alderlake there is similar issue like PR 81616, enable
avoid_fma256_chain will also benefit on Intel latest platforms
Alderlake and Sapphire Rapids.
Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for master?
gcc/ChangeLog:
* config/i386/x86-tune.def (X86_TUNE_AVOID_256FMA_
From: wwwhhhyyy
Hi,
For GoldenCove micro-architecture, force insert zero-idiom in asm
template to break false dependency of dest register for several insns.
The related insns are:
VPERM/D/Q/PS/PD
VRANGEPD/PS/SD/SS
VGETMANTSS/SD/SH
VGETMANDPS/PD - mem version only
VPMULLQ
VFMULCSH/PH
VFCMULCSH/
3, 2022 at 8:28 AM Hongyu Wang wrote:
> >
> > From: wwwhhhyyy
> >
> > Hi,
> >
> > For GoldenCove micro-architecture, force insert zero-idiom in asm
> > template to break false dependency of dest register for several insns.
> >
> > Th
increases maintenance effort. If we split
them at epilogue_complete stage,
it seems not much difference to put it under output template...
Hongyu Wang 于2022年1月14日周五 13:38写道:
>
> > No, the approach is wrong. You have to solve output clearing on RTL
> > level, please look at how e
ition
(which means there is true dependency).
Uros Bizjak 于2022年1月14日周五 16:37写道:
>
> On Fri, Jan 14, 2022 at 7:11 AM Hongyu Wang wrote:
> >
> > > > No, the approach is wrong. You have to solve output clearing on RTL
> > > > level, please look at how e.g. t
.
I added reg_mentioned_p for all insns except fp16 complex mult, since
they have constraint & to the dest so it must be allocated different
register from src.
Uros Bizjak 于2022年1月14日周五 23:49写道:
>
> On Fri, Jan 14, 2022 at 2:44 PM Hongyu Wang wrote:
> >
> > > Are there any technic
Thanks Jeff's comment.
> Presumably the difficulty here is we need to find a suitable hard
> register so that we can emit the vsetvl.
Yes. We use the GPR which has been flagged in the need_zeroed_regs to
hold the vl. There should be one GPR we can use, otherwise, will throw
an exception.
> Do
Hi Kito, Juzhe, Jeff,
Thanks for your kindly reviews. I have modified based on the comments and ran
the testsuite on my local. Could you please take another look ? If any more
comments please let me know.
Thanks
Yanzhang
> -Original Message-
> From: Wang, Yanzhang
> Sent
1001 - 1048 of 1048 matches
Mail list logo