[Patch] x86: Enable GCC support for Intel Hreset extension

2020-10-13 Thread Hongyu Wang via Gcc-patches
/other/i386-3.C: Likewise. -- Regards, Hongyu, Wang From 9dbb6bfb28431cd52149e12cc5f359be7fb46c64 Mon Sep 17 00:00:00 2001 From: Hongyu Wang Date: Tue, 7 Apr 2020 18:39:53 + Subject: [PATCH] Enable Intel HRESET Instruction gcc/ * common/config/i386/cpuinfo.h (get_available_features

[Patch] x86: Enable GCC support for Intel AVX-VNNI extension

2020-10-13 Thread Hongyu Wang via Gcc-patches
://software.intel.com/content/dam/develop/external/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf Bootstrap ok, regression test on i386/x86 backend is ok. OK for master? 2020-10-13 Hongtao Liu Hongyu Wang gcc/ * common/config/i386/cpuinfo.h

Re: [Patch] x86: Enable support for Intel UINTR extension

2020-10-14 Thread Hongyu Wang via Gcc-patches
et passed. And there is no intrinsic/builtin with const int parameter. So we remove -muintr from these files. Uros Bizjak 于2020年10月14日周三 下午2:18写道: > On Tue, Oct 13, 2020 at 10:30 AM Hongyu Wang > wrote: > > > > Hi: > > > > This patch is about to support User

Re: [Patch] x86: Enable support for Intel UINTR extension

2020-10-14 Thread Hongyu Wang via Gcc-patches
Uros Bizjak 于2020年10月14日周三 下午4:42写道: > On Wed, Oct 14, 2020 at 10:34 AM Hongyu Wang > wrote: > > > > > > > > Please also add -muintr to g++.dg/other/i386-{2,3}.C and > > > gcc.target/i386-sse-{12,13,14,22,23}.c. This will test new intrinsics > > &

Re: [Patch] x86: Enable support for Intel UINTR extension

2020-10-14 Thread Hongyu Wang via Gcc-patches
Uros Bizjak 于2020年10月14日周三 下午4:53写道: > > On Wed, Oct 14, 2020 at 10:42 AM Uros Bizjak wrote: > > > > On Wed, Oct 14, 2020 at 10:34 AM Hongyu Wang wrote: > > > > > > > > > > > Please also add -muintr to g++.dg/other/i386-{2,3}.C and > > &

Re: [Patch] x86: Enable support for Intel UINTR extension

2020-10-14 Thread Hongyu Wang via Gcc-patches
Uros Bizjak 于2020年10月14日周三 下午7:19写道: > > > > Please also add -muintr to g++.dg/other/i386-{2,3}.C and > > > >> > > gcc.target/i386-sse-{12,13,14,22,23}.c. This will test new intrinsics > > > >> > > header. > > > >> > > > > > >> > > > > >> > Thanks for your review. We found that without adding -mui

Re: [Patch] x86: Enable support for Intel UINTR extension

2020-10-14 Thread Hongyu Wang via Gcc-patches
help. H.J. Lu 于2020年10月14日周三 下午9:35写道: > > On Wed, Oct 14, 2020 at 6:31 AM Hongyu Wang via Gcc-patches > wrote: > > > > Uros Bizjak 于2020年10月14日周三 下午7:19写道: > > > > > > > > Please also add -muintr to g++.dg/other/i386-{2,3}.C and > > > > &

Re: [Patch] x86: Enable GCC support for Intel Hreset extension

2020-10-14 Thread Hongyu Wang via Gcc-patches
> > The patch doesn't include all testsuite changes. > Yes, I update -mhreset in x86gprintrin-{1,2,3,4,5}.c We will check-in the attached patch. Thanks. Uros Bizjak 于2020年10月14日周三 下午2:26写道: > > On Tue, Oct 13, 2020 at 10:49 AM Hongyu Wang wrote: > > > > Hi

Re: PING [PATCH] Enable GCC support for Intel Key Locker extension

2020-10-21 Thread Hongyu Wang via Gcc-patches
Hi, > IIRC, adding a new regclass is O(n^2), so it should be avoided. I > think that the new patterns should follow the same path as vzeroall > and vzeroupper patterns, where we emit the pattern with explicit hard > regs. > > BTW: We do have SSE_FIRST_REG class, but this class was added to solve >

PING [Patch] x86: Enable GCC support for Intel AVX-VNNI extension

2020-10-28 Thread Hongyu Wang via Gcc-patches
Hongyu Wang 于2020年10月14日周三 上午11:27写道: > > Hi: > > This patch is about to support Intel AVX-VNNI instructions. > > AVX-VNNI is an equivalent to AVX512-VNNI with VEX encoding. The instructions > are same, but with extra {vex} prefix to distinguish from AVX512-VNNI > i

Re: PING [PATCH] Enable GCC support for Intel Key Locker extension

2020-10-28 Thread Hongyu Wang via Gcc-patches
Oct 21, 2020 at 1:48 PM Uros Bizjak wrote: > > > > On Wed, Oct 21, 2020 at 11:11 AM Hongyu Wang wrote: > > > > > > Hi, > > > > > > > IIRC, adding a new regclass is O(n^2), so it should be avoided. I > > > > think that the new patter

Re: PING [PATCH] Enable GCC support for Intel Key Locker extension

2020-10-28 Thread Hongyu Wang via Gcc-patches
c. Thanks for all the the helpful comments. Updated patch. Hongtao Liu 于2020年10月29日周四 上午9:53写道: > > On Wed, Oct 28, 2020 at 8:24 PM Uros Bizjak wrote: > > > > On Wed, Oct 28, 2020 at 10:54 AM Hongyu Wang wrote: > > > > > > Hi Uros, > > > > &g

Re: PING [PATCH] Enable GCC support for Intel Key Locker extension

2020-10-29 Thread Hongyu Wang via Gcc-patches
Thanks for your review! I'll ask Hongtao to check-in the patch for me. Uros Bizjak 于2020年10月29日周四 下午4:08写道: > > On Thu, Oct 29, 2020 at 7:52 AM Hongyu Wang wrote: > > > > Hi Uros, > > > > > is there a reason to introduce all these (with corresponding c

[PATCH] Optimize VEC_PERM_EXPR with same permutation index and operation [PR98167]

2022-11-03 Thread Hongyu Wang via Gcc-patches
Hi, This is a follow-up patch for PR98167 The sequence c1 = VEC_PERM_EXPR (a, a, mask) c2 = VEC_PERM_EXPR (b, b, mask) c3 = c1 op c2 can be optimized to c = a op b c3 = VEC_PERM_EXPR (c, c, mask) for all integer vector operation, and float operation with full permutation.

Re: [PATCH V2] Enable small loop unrolling for O2

2022-11-08 Thread Hongyu Wang via Gcc-patches
tunings instead of 1. Yes, here is the updated patch that changes the cost table. Bootstrapped & regrtested on x86_64-pc-linux-gnu. Ok for trunk? Hongtao Liu via Gcc-patches 于2022年11月8日周二 11:05写道: > > On Mon, Nov 7, 2022 at 10:25 PM Richard Biener via Gcc-patches > wrote: > > &

Re: [PATCH] Optimize VEC_PERM_EXPR with same permutation index and operation [PR98167]

2022-11-09 Thread Hongyu Wang via Gcc-patches
c-patches 于2022年11月8日周二 22:38写道: > > On Fri, Nov 4, 2022 at 7:44 AM Prathamesh Kulkarni via Gcc-patches > wrote: > > > > On Fri, 4 Nov 2022 at 05:36, Hongyu Wang via Gcc-patches > > wrote: > > > > > > Hi, > > > > > > This is a foll

Re: [PATCH] Optimize VEC_PERM_EXPR with same permutation index and operation [PR98167]

2022-11-10 Thread Hongyu Wang via Gcc-patches
xample! We also tried using wide_int as a bitmask but your code looks more simple and reasonable. Updated the patch accordingly. Richard Biener 于2022年11月10日周四 16:56写道: > > On Thu, Nov 10, 2022 at 3:27 AM Hongyu Wang wrote: > > > > Hi Prathamesh and Richard, > > > >

RE: [PATCH V2] Enable small loop unrolling for O2

2022-11-10 Thread Wang, Hongyu via Gcc-patches
Thanks for the notification! I’m not aware of the compile farm before. Will see what’s the impact of my patch then. Regards, Hongyu, Wang From: David Edelsohn Sent: Thursday, November 10, 2022 1:22 AM To: Wang, Hongyu Cc: GCC Patches Subject: Re: [PATCH V2] Enable small loop unrolling for O2

Re: [PATCH V2] Enable small loop unrolling for O2

2022-11-13 Thread Hongyu Wang via Gcc-patches
> Ok, Note GCC documents have been ported to sphinx, so you need to > adjust changes in invoke.texi to new sphinx files. Yes, this is the patch I'm going to check-in. Thanks. Hongtao Liu 于2022年11月14日周一 09:35写道: > > On Wed, Nov 9, 2022 at 9:29 AM Hongyu Wang wrote: &

[PATCH] doc: Reword the description of -mrelax-cmpxchg-loop [PR 107676]

2022-11-14 Thread Hongyu Wang via Gcc-patches
Hi, According to PR 107676, the document of -mrelax-cmpxchg-loop is nonsensical. Adjust the wording according to the comments. Bootstrapped on x86_64-pc-linux-gnu, ok for trunk? gcc/ChangeLog: PR target/107676 * doc/invoke.texi: Reword the description of -mrelax-cmpxchg-

Re: [PATCH] doc: Reword the description of -mrelax-cmpxchg-loop [PR 107676]

2022-11-15 Thread Hongyu Wang via Gcc-patches
issuing an atomic load before the > @code{CMPXCHG} instruction, and using the @code{PAUSE} instruction > to save CPU power when restarting the loop. > > Alexander From e82f3e03115480ac3d055819658a107249932c65 Mon Sep 17 00:00:00 2001 From: Hongyu Wang Date: Tue, 15 Nov 2022 11:16:17 +0800 Subje

Re: [PATCH] doc: Reword the description of -mrelax-cmpxchg-loop [PR 107676]

2022-11-16 Thread Hongyu Wang via Gcc-patches
> Please use 'git commit --author' to indicate authorship of the patch > (or simply let me push it once approved). Yes, just change the author and push it. Thanks for your help!

[PATCH] rs6000: Adjust loop_unroll_adjust to match middle-end change [PR 107692]

2022-11-16 Thread Hongyu Wang via Gcc-patches
Hi, r13-3950-g071e428c24ee8c enables O2 small loop unrolling, but it breaks -fno-unroll-loops for rs6000 with loop_unroll_adjust hook. Adjust the option handling and target hook accordingly. Bootstrapped & regtested on powerpc64le-linux-gnu, OK for trunk? gcc/ChangeLog: PR target/107692

Re: [PATCH] Optimize VEC_PERM_EXPR with same permutation index and operation [PR98167]

2022-11-16 Thread Hongyu Wang via Gcc-patches
> I assume the "full permutation" condition is to avoid performing some > extra operations that would raise exception flags. If so, are there > conditions (-fno-trapping-math?) where the transformation would be safe > with arbitrary shuffles? Yes, that could be an alternative choice with -fno-trap

Re: [PATCH] MAINTAINERS: Add myself for write after approval

2022-06-27 Thread Hongyu Wang via Gcc-patches
According to the official guide, please sort your last name in alphabetical order, which means you shold put your name between Dave Korn Julia Koval Kong, Lingling via Gcc-patches 于2022年6月27日周一 16:05写道: > > Hi, > > I want to add myself in MAINTANINER for write after approval. > > OK for maste

Re: [PATCH] MAINTAINERS: Add myself for write after approval

2022-06-27 Thread Hongyu Wang via Gcc-patches
Sorry, should be between Boris Kolpackov Dave Korn Hongyu Wang 于2022年6月27日周一 16:29写道: > > According to the official guide, please sort your last name in > alphabetical order, which means you shold put your name between > > Dave Korn > Julia Koval > > Kong, Lingling v

Re: [PATCH 3/3] x86: Update memcpy/memset inline strategies for -mtune=generic

2021-03-22 Thread Hongyu Wang via Gcc-patches
of increasing CLEAR_RATIO on > > Hongyue, please collect code size differences on SPEC CPU 2017 and > eembc. > > > SPEC/eembc? Did you play with other values of MOVE/CLEAR_RATIO? > > 17 memory-to-memory/memory-clear insns looks quite a lot. > > > > Yes, we did. 256 bytes is the threshold above which memcpy/memset in libc > win. Below 256 bytes, 16 by_pieces move/store is faster. > > -- > H.J. -- Regards, Hongyu, Wang

Re: [PATCH v2 1/3] x86: Update memcpy/memset inline strategies for Ice Lake

2021-03-31 Thread Hongyu Wang via Gcc-patches
and using short sequence > > for those would be nice. > > > > Having minsize non-trivial may not be that uncommon these days either > > given that we track value ranges (and under assumption that > > memcpy/memset expanders was updated to take these into account). > > > > Hongyu has done some analysis on this. Hongyu, can you share what > you got? > > Thanks. > > -- > H.J. -- Regards, Hongyu, Wang

Re: [PATCH 2/3] x86: Update memcpy/memset inline strategies for Skylake family CPUs

2021-04-06 Thread Hongyu Wang via Gcc-patches
> Do you know what of the three changes (preferring reps/stosb, > CLEAR_RATIO and algorithm choice changes) cause the two speedups > on eebmc? A extracted testcase from nnet_test in https://godbolt.org/z/c8KdsohTP This loop is transformed to builtin_memcpy and builtin_memset with size 280. Curre

[PATCH] [i386] Clear odata for aes(enc|dec)(wide)?kl intrinsics

2021-07-01 Thread Hongyu Wang via Gcc-patches
For Keylocker aesenc/aesdec intrinsics, current implementation moves idata to odata unconditionally, which causes safety issue when the instruction meets runtime error. So we add a branch to clear odata when ZF is set after instruction exectution. gcc/ChangeLog: * config/i386/i386-expand.

Re: [PATCH] [i386] Clear odata for aes(enc|dec)(wide)?kl intrinsics

2021-07-01 Thread Hongyu Wang via Gcc-patches
gt; On Thu, Jul 1, 2021 at 3:51 PM Hongyu Wang wrote: > > > > For Keylocker aesenc/aesdec intrinsics, current implementation > > moves idata to odata unconditionally, which causes safety issue when > > the instruction meets runtime error. So we add a branch to clear &

Re: [PATCH] [i386] Clear odata for aes(enc|dec)(wide)?kl intrinsics

2021-07-01 Thread Hongyu Wang via Gcc-patches
Updated patch with minor change to move the variable declaration after comment. Hongtao, could you help check-in the patch? Hongyu Wang 于2021年7月1日周四 下午4:16写道: > > > Change some keylocker insn to Keylocker aesenc/aesdec in comments. > > others LGTM. > > Changed.

Re: [PATCH] Fix typo in standard pattern name of trunc2.

2021-07-02 Thread Hongyu Wang via Gcc-patches
This caused XPASS: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqw 2 FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdw 1 XPASS: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdw 2 liuhongt via Gcc-patches 于2021年7月1日周四 下午3:45写道: > > Bootstrapped

[PATCH] [i386] Remove rex64suffix for v?cvtt?(ss|sd)*2si

2021-07-02 Thread Hongyu Wang via Gcc-patches
Hi, For instructions like cvtss2si, there is no need to output the 'l' or 'q' suffixes just like cvtss2usi, since the output operand is always register and those suffixes are only used to distinguish ambiguous memory operands. Bootstraped and regression tested on x86_64-linux-gnu {,-m32}. OK for

Re: [PATCH] [i386] Remove rex64suffix for v?cvtt?(ss|sd)*2si

2021-07-02 Thread Hongyu Wang via Gcc-patches
> > On Fri, Jul 2, 2021 at 10:30 AM Hongyu Wang wrote: > > > > Hi, > > > > For instructions like cvtss2si, there is no need to output the 'l' > > or 'q' suffixes just like cvtss2usi, since the output operand is always > > regi

Re: [PATCH] [i386] Remove rex64suffix for v?cvtt?(ss|sd)*2si

2021-07-02 Thread Hongyu Wang via Gcc-patches
Uros Bizjak 于2021年7月2日周五 下午7:07写道: > > On Fri, Jul 2, 2021 at 12:48 PM Hongyu Wang wrote: > > > > > > > > On Fri, Jul 2, 2021 at 10:30 AM Hongyu Wang wrote: > > > > > > > > Hi, > > > > > > > > For instructions like

[PATCH] i386: Only enable small loop unrolling in backend [PR 107602]

2022-11-18 Thread Hongyu Wang via Gcc-patches
Hi, Followed by the discussion in pr107602, -munroll-only-small-loops Does not turns on/off -funroll-loops, and current check in pass_rtl_unroll_loops::gate would cause -funroll-loops do not take effect. Revert the change about targetm.loop_unroll_adjust and apply the backend option change to stri

Re: [PATCH] i386: Only enable small loop unrolling in backend [PR 107602]

2022-11-20 Thread Hongyu Wang via Gcc-patches
1 AM Liu, Hongtao via Gcc-patches > wrote: > > > > > > > > > -Original Message- > > > From: Wang, Hongyu > > > Sent: Saturday, November 19, 2022 2:26 PM > > > To: gcc-patches@gcc.gnu.org > > > Cc: richard.guent...@gmail.com; ub

Re: [PATCH] rs6000: Adjust loop_unroll_adjust to match middle-end change [PR 107692]

2022-11-22 Thread Hongyu Wang via Gcc-patches
he middle-end part with target hook looks quite tricky (and of course the OPTION_SET_P in the target hook). So Richard if you agree, I'd like to install the reversion patch posted in https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606774.html and move all them to the backend first. -- Regards, Hongyu, Wang

Re: [PATCH] i386: Only enable small loop unrolling in backend [PR 107602]

2022-11-22 Thread Hongyu Wang via Gcc-patches
a patch in https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606478.html to change the rs6000 target and there is a discussion ongoing. If we draw a conclusion that we only want to make changes in the backend, I will install this one. -- Regards, Hongyu, Wang

[PATCH] i386: Avoid fma_chain for -march=alderlake and sapphirerapids.

2022-12-06 Thread Hongyu Wang via Gcc-patches
For Alderlake there is similar issue like PR 81616, enable avoid_fma256_chain will also benefit on Intel latest platforms Alderlake and Sapphire Rapids. Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,}. Ok for master? gcc/ChangeLog: * config/i386/x86-tune.def (X86_TUNE_AVOID_256FMA_

[PATCH] [i386] GLC tuning: Break false dependency for dest register.

2022-01-12 Thread Hongyu Wang via Gcc-patches
From: wwwhhhyyy Hi, For GoldenCove micro-architecture, force insert zero-idiom in asm template to break false dependency of dest register for several insns. The related insns are: VPERM/D/Q/PS/PD VRANGEPD/PS/SD/SS VGETMANTSS/SD/SH VGETMANDPS/PD - mem version only VPMULLQ VFMULCSH/PH VFCMULCSH/

Re: [PATCH] [i386] GLC tuning: Break false dependency for dest register.

2022-01-13 Thread Hongyu Wang via Gcc-patches
3, 2022 at 8:28 AM Hongyu Wang wrote: > > > > From: wwwhhhyyy > > > > Hi, > > > > For GoldenCove micro-architecture, force insert zero-idiom in asm > > template to break false dependency of dest register for several insns. > > > > Th

Re: [PATCH] [i386] GLC tuning: Break false dependency for dest register.

2022-01-13 Thread Hongyu Wang via Gcc-patches
increases maintenance effort. If we split them at epilogue_complete stage, it seems not much difference to put it under output template... Hongyu Wang 于2022年1月14日周五 13:38写道: > > > No, the approach is wrong. You have to solve output clearing on RTL > > level, please look at how e

Re: [PATCH] [i386] GLC tuning: Break false dependency for dest register.

2022-01-14 Thread Hongyu Wang via Gcc-patches
ition (which means there is true dependency). Uros Bizjak 于2022年1月14日周五 16:37写道: > > On Fri, Jan 14, 2022 at 7:11 AM Hongyu Wang wrote: > > > > > > No, the approach is wrong. You have to solve output clearing on RTL > > > > level, please look at how e.g. t

Re: [PATCH] [i386] GLC tuning: Break false dependency for dest register.

2022-01-15 Thread Hongyu Wang via Gcc-patches
. I added reg_mentioned_p for all insns except fp16 complex mult, since they have constraint & to the dest so it must be allocated different register from src. Uros Bizjak 于2022年1月14日周五 23:49写道: > > On Fri, Jan 14, 2022 at 2:44 PM Hongyu Wang wrote: > > > > > Are there any technic

RE: [PATCH v3] RISC-V: Fix regression of -fzero-call-used-regs=all

2023-04-09 Thread Wang, Yanzhang via Gcc-patches
Thanks Jeff's comment. > Presumably the difficulty here is we need to find a suitable hard > register so that we can emit the vsetvl. Yes. We use the GPR which has been flagged in the need_zeroed_regs to hold the vl. There should be one GPR we can use, otherwise, will throw an exception. > Do

RE: [PATCH v5] RISC-V: Fix regression of -fzero-call-used-regs=all

2023-04-11 Thread Wang, Yanzhang via Gcc-patches
Hi Kito, Juzhe, Jeff, Thanks for your kindly reviews. I have modified based on the comments and ran the testsuite on my local. Could you please take another look ? If any more comments please let me know. Thanks Yanzhang > -Original Message- > From: Wang, Yanzhang > Sent

<    6   7   8   9   10   11