from:"Haochen Jiang"

[PATCH] i386: Decouple AMX-AVX512 from AVX10.2 and imply AVX512F

2025-07-14 Thread Haochen Jiang

Hi all, In ISE058, the AVX10.2 imply is removed from AMX-AVX512. This leads to re-consideration on the imply for AMX-AVX512. Since it is using zmm register and using zmm register only, we need to at least imply AVX512F. AVX512VL is not needed. On the other hand, if we imply AVX10.1 for AMX-AVX51

[PATCH] i386: Remove KEYLOCKER related feature since Panther Lake and Clearwater Forest

2025-07-13 Thread Haochen Jiang

Hi all, According to July 2025 SDM, Key locker will no longer be supported on hardware 2025 onwards. This means for Panther Lake and Clearwater Forest, the feature will not be enabled. Remove them from those two platforms. Ok for trunk and backport to GCC14/15? Thx, Haochen gcc/ChangeLog:

[PATCH] i386: Change Diamond Rapids feature detect when model number could not be distinguished

2025-07-01 Thread Haochen Jiang

Hi all, We will use AMX-FP8 for DMR since it is a smaller and more unique feature. Ok for trunk and backport to GCC 15? Thx, Haochen gcc/ChangeLog: * config/i386/driver-i386.cc (host_detect_local_cpu): Change to AMX-FP8 for Diamond Rapids. --- gcc/config/i386/driver-i386.cc |

[PATCH] i386: Remove CLDEMOTE for clients

2025-06-19 Thread Haochen Jiang

Hi all, CLDEMOTE is not enabled on clients according to SDM. SDM only mentioned it will be enabled on Xeon and Atom servers, not clients. Remove them since Alder Lake (where it is introduced). Also will backport this patch to GCC12/13/14/15 with some tweaks in texi change. Ok for trunk? Thx, Ha

[gcc-wwwdocs PATCH] gcc-15: Correct DMR ISA base platform to include AMX-COMPLEX

2025-06-12 Thread Haochen Jiang

Hi all, I just found that since AMX-COMPLEX is enabled on Diamond Rapids but not enabled on Granite Rapids, we should use the ISA level from Granite Rapids D instead of Granite Rapids to show that. Since Diamond Rapids is the actual successor of Granite Rapids but not Granite Rapids D, I slightly

[PATCH] doc: Fix extend.texi menu

2025-05-27 Thread Haochen Jiang

Hi all, commit 517c9487f8fdc4e4e90252a9365e5823259dc783 Author: Alejandro Colomar Date: Thu May 22 01:15:36 2025 +0200 c: Add _Countof operator [PR117025] broke gcc build on RHEL 9 when building texi files: gcc/doc/extend.texi:6: node `C Extensions' lacks menu item for `_Countof' despite

[PATCH 1/7] i386: Remove -mavx10.1-256/512 options

2025-05-14 Thread Haochen Jiang

As we mentioned in GCC 15, we will remove avx10.1-256/512 in GCC 16. Also, the combination of AVX10 and AVX512 option behavior will also be simplified in GCC 16 since AVX10.1 now implied AVX512, making the behavior matching everyone else. gcc/ChangeLog: * common/config/i386/cpuinfo.h

[PATCH 4/7] i386: Remove duplicate iterators in md

2025-05-14 Thread Haochen Jiang

There are several iterators no longer needed in md files since after refactor in AVX10, they could directly use legacy AVX512 ones. Remove those duplicate iterators. gcc/ChangeLog: * config/i386/sse.md (VF1_VF2_AVX10_2): Removed. (VF2_AVX10_2): Ditto. (VI1248_AVX10_2): Dit

[PATCH v2 0/7] Remove -mavx10.1-256/512 and -mno-evex512

2025-05-14 Thread Haochen Jiang

Hi all, This is the v2 patch to remove -mavx10.1/256-512 and -mno-evex512. I suppose this time all the patches will not be held due to size. As mentioned in GCC 15, we will remove -mavx10.1-256/512 and -mno-evex512 options in GCC 16. Also we will do some clean up in code for all the size happenin

[PATCH 0/5] Remove -mavx10.1-256/512 and -mno-evex512

2025-05-13 Thread Haochen Jiang

Hi all, As mentioned in GCC 15, we will remove -mavx10.1-256/512 and -mno-evex512 options in GCC 16. Also we will do some clean up in code for all the size happening all together. The first patch of the patch set removes those options, while the following four is refactoring and cleaning up for t

[PATCH 2/5] i386: Remove duplicate iterators in md

2025-05-13 Thread Haochen Jiang

There are several iterators no longer needed in md files since after refactor in AVX10, they could directly use legacy AVX512 ones. Remove those duplicate iterators. gcc/ChangeLog: * config/i386/sse.md (VF1_VF2_AVX10_2): Removed. (VF2_AVX10_2): Ditto. (VI1248_AVX10_2): Dit

[PATCH] i386: Add PTA_AVX10_1_256 to PTA_DIAMONDRAPIDS

2025-04-05 Thread Haochen Jiang

Hi all, For -march= handling, PTA_AVX10_1 will not imply PTA_AVX10_1_256, resulting in TARGET_AVX10_1 becoming true while TARGET_AVX10_1_256 false. Since we will check TARGET_AVX10_1_256 in GCC 15 for AVX512 feature enabling for AVX10, -march=diamondrapids will not enable 512 bit register and x/ym

[PATCH 17/27] Revert "AVX10.2 ymm rounding: Support vcvttph2{, u}{dq, qq, w} intrins"

2025-04-05 Thread Haochen Jiang

This reverts commit 493c5096050523ebc05e5fa21612683a996b97a7. --- gcc/config/i386/avx10_2roundingintrin.h | 335 -- gcc/config/i386/i386-builtin.def | 6 - gcc/config/i386/sse.md| 10 +- gcc/config/i386/subst.md |

[PATCH 27/27] i386: Raise deprecate warning for -mavx10.1-256/512 and -mevex512 while add -mavx10.1 back with 512 bit alias

2025-04-05 Thread Haochen Jiang

When AVX10.1 options are added into GCC 14, E-core is supposed to support up to 256 bit vector width, while P-core up to 512 bit vector width. Therefore, we added avx10.1-256 and avx10.1-512 options into compiler since there will be real platforms with 256 bit only support. At the same time, for ol

[gcc-wwwdocs PATCH] gcc-14/15: Mention recent change for Intel x86_64

2025-03-24 Thread Haochen Jiang

Hi all, This patch will mention recent changes for Intel x86_64 in GCC 14 and 15. Ok for wwwdocs? Thx, Haochen --- Mention AVX10.1 option changes, revise AVX10.2 option and mention APX_F new feature in GCC 15. --- htdocs/gcc-14/changes.html | 12 htdocs/gcc-15/changes.html | 33 +

[PATCH 20/27] Revert "AVX10.2 ymm rounding: Support vcvtph2{, u}w and vcvtps2p{d, hx} intrins"

2025-03-19 Thread Haochen Jiang

This reverts commit b70bb94aca7bc10a54f744d793c32c51f91ce195. --- gcc/config/i386/avx10_2roundingintrin.h | 220 -- gcc/config/i386/i386-builtin-types.def| 3 - gcc/config/i386/i386-builtin.def | 4 - gcc/config/i386/i386-expand.cc|

[PATCH 22/27] Revert "AVX10.2 ymm rounding: Support vcvtpd2{, u}{dq, qq} intrins"

2025-03-19 Thread Haochen Jiang

This reverts commit 508ac49e1a94c28346642bff512d0ed5f4f58b64. --- gcc/config/i386/avx10_2roundingintrin.h | 218 -- gcc/config/i386/i386-builtin-types.def| 2 - gcc/config/i386/i386-builtin.def | 4 - gcc/config/i386/i386-expand.cc|

[PATCH 23/27] Revert "AVX10.2 ymm rounding: Support vcvtdq2p{s, h} and vcvtpd2p{s, h} intrins"

2025-03-19 Thread Haochen Jiang

This reverts commit 85e874d19548f0dcb9a3f14f9e4b1e3411c88c4b. --- gcc/config/i386/avx10_2roundingintrin.h | 210 -- gcc/config/i386/i386-builtin-types.def| 4 - gcc/config/i386/i386-builtin.def | 4 - gcc/config/i386/i386-expand.cc|

[PATCH 21/27] Revert "AVX10.2 ymm rounding: Support vcvtph2p{s, d, sx} and vcvtph2{, u}{dq, qq} intrins"

2025-03-19 Thread Haochen Jiang

This reverts commit 6f2eac53b6026836f3222961c32312e02c2c7dbc. --- gcc/config/i386/avx10_2roundingintrin.h | 384 -- gcc/config/i386/i386-builtin-types.def| 4 - gcc/config/i386/i386-builtin.def | 7 - gcc/config/i386/i386-expand.cc|

[PATCH 19/27] Revert "AVX10.2 ymm rounding: Support vcvtps2{, u}{dq, qq} intrins"

2025-03-19 Thread Haochen Jiang

This reverts commit 0f5a42d41b46b746c6f77374d76a3b918a1e2b57. --- gcc/config/i386/avx10_2roundingintrin.h | 226 -- gcc/config/i386/i386-builtin-types.def| 2 - gcc/config/i386/i386-builtin.def | 4 - gcc/config/i386/i386-expand.cc|

[PATCH 18/27] Revert "AVX10.2 ymm rounding: Support vcvtqq2p{s, d, h} and vcvttpd2{, u}{dq, qq} intrins"

2025-03-19 Thread Haochen Jiang

This reverts commit 6e231f8504874828b23bbe89f3ef4086dcc15a44. --- gcc/config/i386/avx10_2roundingintrin.h | 390 -- gcc/config/i386/i386-builtin-types.def| 3 - gcc/config/i386/i386-builtin.def | 7 - gcc/config/i386/i386-expand.cc|

[PATCH 16/27] Revert "AVX10.2 ymm rounding: Support vcvttps2{, u}{dq, qq} and vcvtu{dq, qq}2p{s, d, h} intrins"

2025-03-19 Thread Haochen Jiang

This reverts commit b2754227139512adecb6fda067632b587ff4a017. --- gcc/config/i386/avx10_2roundingintrin.h | 492 -- gcc/config/i386/i386-builtin.def | 9 - gcc/config/i386/sse.md| 27 +- gcc/testsuite/gcc.target/i386/avx-1.c |

[PATCH 06/27] Revert "AVX10.2 ymm rounding: Support vmulp{s, d, h} and vrangep{s, d} intrins"

2025-03-19 Thread Haochen Jiang

This reverts commit 90cc5b0c4609a9fb3257d2cce7b7abc896c6faab. --- gcc/config/i386/avx10_2roundingintrin.h | 313 -- gcc/config/i386/i386-builtin-types.def| 2 - gcc/config/i386/i386-builtin.def | 5 - gcc/config/i386/i386-expand.cc|

[PATCH 24/27] Revert "AVX10.2 ymm rounding: Support vadd{s, d, h} and vcmp{s, d, h} intrins"

2025-03-19 Thread Haochen Jiang

This reverts commit e22e3af1954469c40b139b7cfa8e7708592f4bfd. --- gcc/config.gcc| 3 +- gcc/config/i386/avx10_2roundingintrin.h | 337 -- gcc/config/i386/i386-builtin-types.def| 6 - gcc/config/i386/i386-builtin.def |

[PATCH 08/27] Revert "AVX10.2 ymm rounding: Support vgetexpp{s, d, h} and vgetmantp{s, d, h} intrins"

2025-03-19 Thread Haochen Jiang

This reverts commit 8d4f542935c09f40bb7fd8fd863cc8df80cc970e. --- gcc/config/i386/avx10_2roundingintrin.h | 341 -- gcc/config/i386/i386-builtin-types.def| 6 - gcc/config/i386/i386-builtin.def | 6 - gcc/config/i386/i386-expand.cc|

[PATCH 14/27] Revert "AVX10.2 ymm rounding: Support vfc{madd, mul}cph, vfixupimmp{s, d} intrins"

2025-03-19 Thread Haochen Jiang

This reverts commit 95980b292b24110d3f1dffb81926df23c61b4fe7. --- gcc/config/i386/avx10_2roundingintrin.h | 247 -- gcc/config/i386/i386-builtin-types.def| 5 - gcc/config/i386/i386-builtin.def | 10 - gcc/config/i386/i386-expand.cc|

[PATCH 13/27] Revert "AVX10.2 ymm rounding: Support vfmadd{132, 231, 213}p{s, d, h} intrins"

2025-03-19 Thread Haochen Jiang

This reverts commit 0683ca355a87fd36a2e7ae1721199204ceff4c4c. --- gcc/config/i386/avx10_2roundingintrin.h | 176 -- gcc/config/i386/i386-builtin.def | 9 - gcc/config/i386/sse.md| 2 +- gcc/testsuite/gcc.target/i386/avx-1.c |

[PATCH 05/27] Revert "AVX10.2 ymm rounding: Support vreducep{s, d, h} and vrndscalep{s, d, h} intrins"

2025-03-19 Thread Haochen Jiang

This reverts commit 9afa5081212e1fc3cb2c4efc9b4f421eecf68810. --- gcc/config/i386/avx10_2roundingintrin.h | 367 -- gcc/config/i386/i386-builtin.def | 6 - gcc/config/i386/sse.md| 4 +- gcc/testsuite/gcc.target/i386/avx-1.c |

[PATCH 07/27] Revert "AVX10.2 ymm rounding: Support v{max, min}p{s, d, h} intrins"

2025-03-19 Thread Haochen Jiang

This reverts commit cc8a7596477e9d6ac972aadabbb2fd88baa1abf4. --- gcc/config/i386/avx10_2roundingintrin.h | 360 -- gcc/config/i386/i386-builtin.def | 6 - gcc/testsuite/gcc.target/i386/avx-1.c | 6 - .../gcc.target/i386/avx10_2-rounding-3.c | 5

[PATCH 15/27] Revert "AVX10.2 ymm rounding: Support vcvt{, u}w2ph and vdivp{s, d, h} intrins"

2025-03-19 Thread Haochen Jiang

This reverts commit 3d1b5530ea1d23e26dc5ab70aa4a2e7b9dc19b50. --- gcc/config/i386/avx10_2roundingintrin.h | 286 -- gcc/config/i386/i386-builtin-types.def| 1 - gcc/config/i386/i386-builtin.def | 5 - gcc/config/i386/i386-expand.cc|

[PATCH 10/27] Revert "AVX10.2 ymm rounding: Support vfmulcph and vfnmadd{132, 231, 213}p{s, d, h} intrins"

2025-03-19 Thread Haochen Jiang

This reverts commit 6f0aa7add1d9177f60016b32ca9ca8b16b173a56. --- gcc/config/i386/avx10_2roundingintrin.h | 241 -- gcc/config/i386/i386-builtin.def | 11 - gcc/testsuite/gcc.target/i386/avx-1.c | 11 - .../gcc.target/i386/avx10_2-rounding-3.c | 5

[PATCH 11/27] Revert "AVX10.2 ymm rounding: Support vfm{sub, subadd}{132, 231, 213}p{s, d, h} intrins"

2025-03-19 Thread Haochen Jiang

This reverts commit dd48acbe85ca55dd23ffafbb917ffe559d13b6a3. --- gcc/config/i386/avx10_2roundingintrin.h | 350 -- gcc/config/i386/i386-builtin.def | 18 - gcc/config/i386/sse.md| 2 +- gcc/testsuite/gcc.target/i386/avx-1.c |

[PATCH 04/27] Revert "AVX10.2 ymm rounding: Support vscalefp{s, d, h} intrins"

2025-03-19 Thread Haochen Jiang

This reverts commit 1f86cf06c7897f6ab467443b5fe8789cc95fe0c4. --- gcc/config/i386/avx10_2roundingintrin.h | 182 -- gcc/config/i386/i386-builtin.def | 3 - gcc/config/i386/sse.md| 2 +- gcc/testsuite/gcc.target/i386/avx-1.c |

[PATCH 02/27] i386: Remove 256 bit rounding for AVX10.2 saturation convert instructions

2025-03-19 Thread Haochen Jiang

Since we will support 512 bit on both P-core and E-core, 256 bit rounding is not that useful because we currently have rounding feature directly on E-core now and no need to use 256-bit rounding as somehow a workaround. This patch will remove 256 bit rounding in AVX10.2 satcvt intrins. gcc/ChangeL

[PATCH 09/27] Revert "AVX10.2 ymm rounding: Support vfnmsub{132, 231, 213}p{s, d, h} intrins"

2025-03-19 Thread Haochen Jiang

This reverts commit 0983d406ae2e84394b25248865f51c686b119a57. --- gcc/config/i386/avx10_2roundingintrin.h | 181 -- gcc/config/i386/i386-builtin.def | 9 - gcc/config/i386/sse.md| 2 +- gcc/testsuite/gcc.target/i386/avx-1.c |

[PATCH 12/27] Revert "AVX10.2 ymm rounding: Support vfmaddcph and vfmaddsub{132, 231, 213}p{s, d, h} intrins"

2025-03-19 Thread Haochen Jiang

This reverts commit cfbc94eaf167ae7aecd21ee6054556e1cf9d7143. --- gcc/config/i386/avx10_2roundingintrin.h | 238 -- gcc/config/i386/i386-builtin.def | 13 - gcc/config/i386/sse.md| 4 +- gcc/testsuite/gcc.target/i386/avx-1.c |

[PATCH 03/27] Revert "AVX10.2 ymm rounding: Support vsqrtp{s, d, h} and vsubp{s, d, h} intrins"

2025-03-19 Thread Haochen Jiang

This reverts commit 7f62e7104ebc11c4570745972a023579922ef265. --- gcc/config/i386/avx10_2roundingintrin.h | 339 -- gcc/config/i386/i386-builtin.def | 6 - gcc/testsuite/gcc.target/i386/avx-1.c | 6 - .../gcc.target/i386/avx10_2-rounding-3.c | 5

[PATCH 01/27] i386: Remove 256 bit rounding for AVX10.2 minmax and convert instructions

2025-03-19 Thread Haochen Jiang

Since we will support 512 bit on both P-core and E-core, 256 bit rounding is not that useful because we currently have rounding feature directly on E-core now and no need to use 256-bit rounding as somehow a workaround. This patch will remove those in AVX10.2 minmax and convert intrins. gcc/Change

[PATCH 00/27] Use avx10.x as the only option for AVX10 with 512 bit vector support while remove avx10.x-256/512 option and 256 bit rounding support

2025-03-19 Thread Haochen Jiang

Hi all, It is a little late for this change but I hope this will be a welcoming change since it will greatly simplify the compiler options and reduce confusion for AVX10 option combination with AVX512. AVX10 whitepaper just got a major change and it will impact how we design the compiler option,

[PATCH] i386: Remove XFAIL for pr103750 testcases

2025-03-11 Thread Haochen Jiang

Hi all, After commit r15-4510, the following testcases also do not need XFAIL. Ok for trunk? Thx, Haochen gcc/testsuite/ChangeLog: * gcc.target/i386/avx512f-pr103750-1.c: Remove XFAIL. * gcc.target/i386/avx512f-pr103750-2.c: Ditto. * gcc.target/i386/avx512fp16-pr103750-

[PATCH] i386: Correct mask width for bf8->fp16 intrin on 256/512 bit

2025-03-04 Thread Haochen Jiang

Hi all, For bf8 -> pf16 convert, when dst is 256 bit, the mask should be 16 bit since 16*16=256, not the 8 bit in the current intrin. In 512 bit intrin, the mask bit is also halved. This patch will fix both of them. Ok for trunk? Thx, Haochen gcc/ChangeLog: * config/i386/avx10_2-512con

[PATCH] i386: Treat Granite Rapids/Granite Rapids-D/Diamond Rapids similar as Sapphire Rapids in x86-tune.def

2025-02-26 Thread Haochen Jiang

Hi all, Since GNR, GNR-D, DMR are both P-core based, we should treat them just like SPR in tuning for now. Ok for trunk and backport to GCC13/14 for GNR/GNR-D part? Thx, Haochen gcc/ChangeLog: * config/i386/x86-tune.def (X86_TUNE_DEST_FALSE_DEP_FOR_GLC): Add GNR, GNR-D, DMR.

[PATCH] i386: Re-order i386.opt.urls

2025-02-17 Thread Haochen Jiang

(Seems patch not sent out, resending) Hi all, The order of i386.opt.urls need to be the same as i386.opt according to auto builder. I thought the urls file is a dict but actually not. Commit as obvious. Thx, Haochen gcc/ChangeLog: * config/i386/i386.opt.urls: Adjust the order for avx1

[committed][PATCH] i386: Regenerate i386.opt.urls

2025-02-16 Thread Haochen Jiang

Hi all, We need to regenerate i386.opt.urls after removing -mavx10.1. Commit as obvious. When backporting to GCC14, I will also include this. Thank autobuilder reminding me of this: https://builder.sourceware.org/buildbot/#/builders/269/builds/12173/steps/8/logs/stdio Thx, Haochen gcc/ChangeLo

[PATCH] i386: Do not check vector size conflict when AVX512 is not explicitly set [PR 118815]

2025-02-13 Thread Haochen Jiang

Hi all, When AVX512 is not explicitly set, we should not take EVEX512 bit into consideration when checking vector size. It will solve the intrin header file reporting warnings when compiling with -Wsystem-headers. However, there is side effect on the usage for '-march=xxx -mavx10.1-256', where xx

[PATCH 2/2] i386: Re-alias avx10.2 to 512 bit and deprecate -mno-avx10.2-[256, 512]

2025-02-13 Thread Haochen Jiang

As mentioned in avx10.1 option deprecate patch, based on the feedback we got, we would like to re-alias avx10.x to 512 bit. For -mno- options, also mentioned in the previous patch, it is confusing what it is disabling when it comes to avx10. So we will only provide -mno-avx10.x options from AVX10.

[PATCH 0/2] i386: Adjust AVX10 related options

2025-02-13 Thread Haochen Jiang

Hi all, According to the previous feedback on our RFC for AVX10 option adjustment and discussion with LLVM, we finalized how we are going to handle that. The overall direction is to re-alias avx10.x alias to 512 bit and only using -mno-avx10.x to disable everything instead of the current confusin

[PATCH 1/2] i386: Deprecate -m[no-]avx10.1 and make -mno-avx10.1-512 to disable the whole AVX10.1

2025-02-13 Thread Haochen Jiang

Based on the feedback we got, we would like to re-alias avx10.x to 512 bit in the future. This leaves the current avx10.1 alias to 256 bit inconsistent. Since it has been there for GCC 14.1 and GCC 14.2, we decide to deprecate avx10.1 alias. The current proposal is not adding it back in the future,

[PATCH] i386: Fix AVX512BW intrin header with OPTIMIZE [PR 118813]

2025-02-09 Thread Haochen Jiang

Hi all, When moving intrins around for AVX10 implementation in GCC 14, the intrin _kshiftli_mask32 and _kshiftri_mask32 are wrongly wrapped by "#if __OPTIMIZE__" instead of "#ifdef __OPTIMIZE__", leading to the intrin file not `-Wsystem-headers -Wundef` clean since r14-4490. Ok for trunk? Thx, H

[RFC PATCH] i386: Re-alias -mavx10.2 to 512 bit and make -mno-avx10.x-512 disable the whole AVX10.x

2025-01-26 Thread Haochen Jiang

Hi all, AVX10 has been published for one and half year and we have got many feedbacks on that, one of the feedback is on whether the alias option -mavx10.x should point to 256 or 512. If you also pay attention to LLVM community, you might see this thread related to AVX10 options just sent out sev

[PATCH] i386: Append -march=x86-64-v3 to AVX10.2/512 VNNI testcases

2025-01-21 Thread Haochen Jiang

Hi all, These two testcases are misses on previous addition for -march=x86-64-v3 to silence warning for -march=native tests. Ok for trunk? Thx, Haochen gcc/testsuite/ChangeLog: * gcc.target/i386/vnniint16-auto-vectorize-4.c: Append -march=x86-64-v3. * gcc.target/i386/vn

[PATCH 13/13] i386: Omit "p" for packed in intrin name for FP8 convert

2025-01-21 Thread Haochen Jiang

gcc/ChangeLog: * config/i386/avx10_2-512convertintrin.h: Omit "p" for packed for FP8. * config/i386/avx10_2convertintrin.h: Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-512-convert-1.c: Adjust intrin call. * gcc.target/i386/avx10_2-512-vcvtbia

[PATCH 12/13] i386: Change mnemonics from VCVT[, T]NEBF162I[, U]BS to VCVT[, T]BF162I[, U]BS

2025-01-21 Thread Haochen Jiang

gcc/ChangeLog: PR target/118270 * config/i386/avx10_2-512satcvtintrin.h: Change intrin and builtin name according to new mnemonics. * config/i386/avx10_2satcvtintrin.h: Ditto. * config/i386/i386-builtin.def (BDESC): Ditto. * config/i386/sse.md

[PATCH 10/13] i386: Change mnemonics from VCVTNE2PH2[B, H]F8 to VCVT2PH2[B, H]F8

2025-01-21 Thread Haochen Jiang

gcc/ChangeLog: PR target/118270 * config/i386/avx10_2-512convertintrin.h: Change intrin and builtin name according to new mnemonics. * config/i386/avx10_2convertintrin.h: Ditto. * config/i386/i386-builtin.def (BDESC): Ditto. * config/i386/sse.md

[PATCH 04/13] i386: Change mnemonics from V[CMP, MAX, MIN]PBF16 to V[CMP, MAX, MIN]BF16

2025-01-21 Thread Haochen Jiang

gcc/ChangeLog: PR target/118270 * config/i386/avx10_2-512bf16intrin.h: Change intrin and builtin name according to new mnemonics. * config/i386/avx10_2bf16intrin.h: Ditto. * config/i386/i386-builtin.def (BDESC): Ditto. * config/i386/sse.md (a

[PATCH 11/13] i386: Change mnemonics from VCVTNEPH2[B, H]F8 to VCVTPH2[B, H]F8

2025-01-21 Thread Haochen Jiang

gcc/ChangeLog: PR target/118270 * config/i386/avx10_2-512convertintrin.h: Change intrin and builtin name according to new mnemonics. * config/i386/avx10_2convertintrin.h: Ditto. * config/i386/i386-builtin.def (BDESC): Ditto. * config/i386/sse.md

[PATCH 08/13] i386: Change mnemonics from V[GETEXP, FPCLASS]PBF16 to V[GETEXP.FPCLASS]BF16

2025-01-21 Thread Haochen Jiang

Besides mnemonics change, this patch also fixed SDE test fail for FPCLASS. gcc/ChangeLog: PR target/118270 * config/i386/avx10_2-512bf16intrin.h: Change intrin and builtin name according to new mnemonics. * config/i386/avx10_2bf16intrin.h: Ditto. * config/i

[PATCH 03/13] i386: Change mnemonics from VF[, N]M[ADD, SUB][132, 213, 231]NEPBF16 to VF[, N]M[ADD, SUB][132, 213, 231]BF16

2025-01-21 Thread Haochen Jiang

gcc/ChangeLog: PR target/118270 * config/i386/avx10_2-512bf16intrin.h: Change intrin and builtin names according to new mnemonics. * config/i386/avx10_2bf16intrin.h: Ditto. * config/i386/i386-builtin.def (BDESC): Ditto. * config/i386/sse.md (

[PATCH 00/13] Realign x86 GCC after Binutils change [PR118270]

2025-01-21 Thread Haochen Jiang

Hi all, Recently, DMR ISAs got lots of changes in mnemonics. The detailed change are: - NE would be removed for all AVX10.2 new insns - VCOMSBF16 -> VCOMISBF16 - P for packed omitted for AI data types (BF16, TF32, FP8) For AMX-AVX512 change, it has been upstreamed previouslv, the remaining

[PATCH 09/13] i386: Change mnemonics from VCOMSBF16 to VCOMISBF16

2025-01-21 Thread Haochen Jiang

Besides mnemonics change, this patch also use the compare pattern instead of UNSPEC. gcc/ChangeLog: PR target/118270 * config/i386/avx10_2bf16intrin.h: Change intrin and builtin name according to new mnemonics. * config/i386/i386-builtin.def (BDESC): Ditto.

[PATCH 07/13] i386: Change mnemonics from V[RSQRT, SCALEF, SQRTNE]PBF16 to V[RSQRT.SCALEF.SQRT]BF16

2025-01-21 Thread Haochen Jiang

gcc/ChangeLog: PR target/118270 * config/i386/avx10_2-512bf16intrin.h: Change intrin and builtin name according to new mnemonics. * config/i386/avx10_2bf16intrin.h: Ditto. * config/i386/i386-builtin.def (BDESC): Ditto. * config/i386/sse.md (U

[PATCH 06/13] i386: Change mnemonics from V[GETMANT, REDUCENE, RNDSCALENE]PBF16 to V[GETMANT, REDUCE, RNDSCALE]BF16

2025-01-21 Thread Haochen Jiang

gcc/ChangeLog: PR target/118270 * config/i386/avx10_2-512bf16intrin.h: Change intrin and builtin name according to new mnemonics. * config/i386/avx10_2bf16intrin.h: Ditto. * config/i386/i386-builtin.def (BDESC): Ditto. * config/i386/sse.md (U

[PATCH 05/13] i386: Change mnemonics from VMINMAXNEPBF16 to VMINMAXBF16

2025-01-21 Thread Haochen Jiang

gcc/ChangeLog: PR target/118270 * config/i386/avx10_2-512minmaxintrin.h: Change intrin and builtin name according to new mnemonics. * config/i386/avx10_2minmaxintrin.h: Ditto. * config/i386/i386-builtin.def (BDESC): Ditto. * config/i386/sse.md

[PATCH 01/13] i386: Enhance AMX tests

2025-01-21 Thread Haochen Jiang

After Binutils got changed, the previous usage on intrin will raise warning for assembler. We need to change that. Besides that, there are separate issues for both AMX-MOVRS and AMX-TRANSPOSE. For AMX-MOVRS, t2rpntlvwrs tests wrongly used AMX-TRANSPOSE intrins in test. Since the only difference be

[PATCH 02/13] i386: Change mnemonics from V[ADDNE, DIVNE, MULNE, RCP, SUBNE]PBF16 to V[ADD, DIV, MUL, RCP, SUB]BF16

2025-01-21 Thread Haochen Jiang

gcc/ChangeLog: PR target/118270 * config/i386/avx10_2-512bf16intrin.h: Change intrin and builtin name according to new mnemonics. * config/i386/avx10_2bf16intrin.h: Ditto. * config/i386/i386-builtin.def (BDESC): Ditto. * config/i386/sse.md (div3): Ad

[PATCH] i386: Remove not used model number for Diamond Rapids

2025-01-07 Thread Haochen Jiang

Hi all, In ISE, The model number for Diamond Rapids is 13_01H. Remove 0x00 since it is unused. Ref: https://cdrdv2.intel.com/v1/dl/getContent/671368 Ok for trunk? Thx, Haochen gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_intel_cpu): Remove 0x00. --- gcc/common/config/i386/cpuin

[PATCH] i386: Change mnemonics from TCVTROWPS2PBF16[H, L] to TCVTROWPS2BF16[H, L]

2025-01-03 Thread Haochen Jiang

Hi all, The mnemonics for TCVTROWPS2PBF16[H,L] has been changed to TCVTROWPS2BF16[H,L] in ISE056. There will be also some more BF16 mnemonics change upcoming, which will fix the regression in PR118270. Bootstraped and tested on x86_64-pc-linux-gnu. Ok for trunk? Ref: https://cdrdv2.intel.com/v1/

[PATCH] Support Intel AVX10.2 minmax, vector copy and compare instructions

2024-12-03 Thread Haochen Jiang

6E): Ditto. (EVEX_LEN_MAP5_7E): Ditto. (EVEX_W_MAP5_6E_P_1): Ditto. (EVEX_W_MAP5_7E_P_1): Ditto. * i386-opc.tbl: Add AVX10.2 instructions. * i386-mnem.h: Regenerated. * i386-tbl.h: Ditto. Co-authored-by: Jun Zhang Co-authored-by: Haochen Jiang ---

[PATCH] i386/testsuite: Correct AVX10.2 FP8 test mask usage

2024-11-22 Thread Haochen Jiang

Hi all, Under FP8, we should not use AVX512F_LEN_HALF to get the mask size since it will get 16 instead of 8 and drop into wrong if condition. Correct the usage for vcvtneph2[b,h]f8[,s] runtime test. Tested under sde. Ok for trunk? Thx, Haochen gcc/testsuite/ChangeLog: * gcc.target/i38

[PATCH] i386/testsuite: Do not append AVX10.2 option for check_effective_target

2024-11-21 Thread Haochen Jiang

Hi all, When -avx10.2 meet -march with AVX512 enabled, it will report warning for vector size conflict. The warning will prevent the test to run on GCC with arch native build on those platforms when check_effective_target. Remove AVX10.2 options since we are using inline asm ad it actually do not

[PATCH] i386/testsuite: Enhance AVX10.2 vmovd/w testcases

2024-11-21 Thread Haochen Jiang

Hi all, Under -fno-omit-frame-pointer, %ebp will be used, which is the Solaris/x86 default. Both check %ebp and %esp to avoid error on that. Tested under -m32 w/ and w/o -fno-omit-frame-pointer. Ok for trunk? Thx, Haochen gcc/testsuite/ChangeLog: PR target/117697 * gcc.target/i

[GCC13 PATCH] testsuite: Correct dg-error to dg-warning for cmpccxadd testcase in GCC13

2024-11-13 Thread Haochen Jiang

Hi all, In GCC13, the error for GCC14+ is actually a warning for the pointer type. Correct that in testcase. Commit as obvious. Thx, Haochen gcc/testsuite/ChangeLog: * gcc.target/i386/cmpccxadd-1b.c: Change to dg-warning. --- gcc/testsuite/gcc.target/i386/cmpccxadd-1b.c | 4 ++-- 1 fi

[gcc-wwwdocs PATCH] gcc-15: Mention new ISA and Diamond Rapids support for x86_64 backend

2024-11-10 Thread Haochen Jiang

Hi all, This patch will add recent new ISA and arch support for x86_64 backend into gcc-wwwdocs. Ok for gcc-wwwdocs? Thx, Haochen --- htdocs/gcc-15/changes.html | 37 + 1 file changed, 37 insertions(+) diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15

[PATCH] Use IN_RANGE in prefetch builtin and fix typo in prefetch testcase (was [r15-4833 Regression] FAIL: gcc.dg/builtin-prefetch-1.c (test for warnings, line 36) on Linux/x86_64)

2024-11-01 Thread Haochen Jiang

Hi all, These are the last minute changes that should apply to MOVRS patch but disappeared in patch. Using IN_RANGE will avoid second usage of INTVAL for prefetch check. Also fixed typos in prefetch testcase. Ok for trunk? Thx, Haochen gcc/ChangeLog: * builtins.cc (expand_builtin_pre

[PATCH] i386: Do not allow pointer conversion for CMPccXADD intrin under -O0

2024-11-01 Thread Haochen Jiang

Hi all, The pointer conversion to wider type under macro would not consider whether the higher bit is cleaned or not. It will lead to unexpected cmp result. After this change, it will throw an incompatible pointer type error just like -O2 does currently. Bootstraped and tested on x86_64-pc-linux

[PATCH 1/2] i386: Add new model number for Arrow Lake

2024-10-31 Thread Haochen Jiang

gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_intel_cpu): Add new model number for Arrow Lake. --- gcc/common/config/i386/cpuinfo.h | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h index 0dcdaafeca5..f415f

[PATCH 2/2] Initial Diamond Rapids Support

2024-10-31 Thread Haochen Jiang

gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_intel_cpu): Handle Diamond Rapids. * common/config/i386/i386-common.cc (processor_name): Add Diamond Rapids. (processor_alias_table): Ditto. * common/config/i386/i386-cpuinfo.h (enum processor_types)

[PATCH 0/2] Add arch support for Intel CPUs

2024-10-31 Thread Haochen Jiang

Hi all, I have just landed new ISA patches on trunk. The next step will be the arch support for ISE055 mentioned CPUs. There are two changes in ISE055 on CPUs: - A new model number is added for Arrow Lake. - Diamond Rapids Support is added. The following two patches will reflect those chang

[PATCH] testsuite: Adjust AVX10.2 check_effective_target

2024-10-29 Thread Haochen Jiang

Hi all, Since Binutils haven't fully merged all AVX10.2 insts, only testing one inst/intrin in AVX10.2 is never sufficient for check_effective_target. Like APX_F, use inline asm to do the target check. Testes w/ and w/o Binutils with full AVX10.2 support. Ok for trunk? Thx, Haochen gcc/testsuit

[PATCH 1/7] Support Intel SM4 EVEX instructions

2024-10-21 Thread Haochen Jiang

gcc/ChangeLog: * config/i386/i386-builtin-types.def: Add DEF_FUNCTION_TYPE (V16SI, V16SI, V16SI). * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_args_builtin): Handle V16SI_FTYPE_V16SI_V16SI. * con

[PATCH 6/7] Support Intel MOVRS

2024-10-21 Thread Haochen Jiang

arget/i386/sse-23.c: Ditto * gcc.target/i386/avx10_2-512movrs-1.c: New test. * gcc.target/i386/avx10_2-movrs-1.c: Ditto. * gcc.target/i386/movrs-1.c: Ditto. Co-authored-by: Haochen Jiang --- gcc/builtins.cc | 4 +- gcc/common/config/i386

[PATCH 5/7] Support Intel AMX-FP8

2024-10-21 Thread Haochen Jiang

From: Liwei Xu gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_available_features): Detect amx-fp8. * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_AMX_FP8_SET): New macros. (OPTION_MASK_ISA2_AMX_FP8_UNSET): Ditto. (ix86_handle_option): Ha

[PATCH 7/7] Support Intel AMX-MOVRS

2024-10-21 Thread Haochen Jiang

From: "Hu, Lin1" gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_available_features): Detect AMX-MOVRS. * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_AMX_MOVRS_SET): New. (OPTION_MASK_ISA2_AMX_MOVRS_UNSET): Ditto. (ix86_handle_option): H

[PATCH 4/7] Support Intel AMX-TRANSPOSE

2024-10-21 Thread Haochen Jiang

gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_available_features): Detect AMX-TRANSPOSE. * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_AMX_TRANSPOSE_SET, OPTION_MASK_ISA2_AMX_TRANSPOSE_UNSET): New. (ix86_handle_option): Handle -mamx-tran

[PATCH 3/7] Support Intel AMX-TF32

2024-10-21 Thread Haochen Jiang

gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_available_features): Detect AMX-TF32. * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_AMX_TF32_SET, OPTION_MASK_ISA2_AMX_TF32_UNSET): New. (ix86_handle_option): Handle -mamx-tf32. * common/conf

[PATCH 2/7] Support Intel AMX-AVX512

2024-10-21 Thread Haochen Jiang

gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_available_features): Detect AMX-AVX512. * common/config/i386/i386-common.c (OPTION_MASK_ISA2_AMX_AVX512_SET, OPTION_MASK_ISA2_AMX_AVX512_UNSET): New. (ix86_handle_option): Handle -mamx-avx512. * comm

[PATCH 0/7] Support Intel Diamond Rapid new features

2024-10-21 Thread Haochen Jiang

Hi all, ISE054 has just been released and you can find doc from here: https://cdrdv2.intel.com/v1/dl/getContent/671368 Diamond Rapids features are added in this ISE, including AMX related instructions, SM4 EVEX extension and MOVRS/PREFETCHRST2. The following seven patches will add all the new f

[PATCH] i386: Refactor get_intel_cpu

2024-10-17 Thread Haochen Jiang

Hi all, ISE054 has just been disclosed and you can find doc from here: https://cdrdv2.intel.com/v1/dl/getContent/671368 >From ISE, it shows that we will have family 0x13 for Diamond Rapids. Therefore, we need to refactor the get_intel_cpu to accept new families. Also I did some reorder in the sw

[PATCH] testsuite: Fix typos for AVX10.2 convert testcases

2024-10-17 Thread Haochen Jiang

From: Victor Rodriguez Hi all, There are some typos in AVX10.2 vcvtne[,2]ph[b,h]f8[,s] testcases. They will lead to type mismatch. Previously they are not found due to the binutils did not checkin. Ok for trunk? Thx, Haochen --- Fix typos related to types for vcvtne[,2]ph[b,h]f8[,s] testcas

[PATCH] testsuite: Add -march=x86-64-v3 to AVX10 testcases to slience warning for GCC built with AVX512 arch

2024-10-16 Thread Haochen Jiang

Hi all, Currently, when build GCC with config --with-arch=native on AVX512 machines, if we run AVX10.2 testcases, we will get vector size warnings. It is expected but annoying. Simply add -march=x86-64-v3 to override --with-arch=native to slience all the warnings. Tested on x86-64-linux-gnu. Ok f

[gcc-wwwdocs PATCH] gcc-14: Mention -march=gracemont support in x86_64

2024-09-18 Thread Haochen Jiang

Hi all, When I was backporting my doc patch in gcc trunk today, I found when adding -march=gracemont in GCC14, the corresponding wwwdoc is missing. This patch is adding that. Ok for wwwdocs trunk? Thx, Haochen --- htdocs/gcc-14/changes.html | 4 1 file changed, 4 insertions(+) diff --git

[PATCH v2] i386: Enhance AVX10.2 convert tests

2024-09-18 Thread Haochen Jiang

Hi all, For AVX10.2 convert tests, all of them are missing mask tests previously, this patch will add them in the tests. Tested on sde with assembler with corresponding insts. Ok for trunk? Thx, Haochen gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-512-vcvt2ps2phx-2.c: Enhance mas

[PATCH] i386: Enhance AVX10.2 convert tests

2024-09-17 Thread Haochen Jiang

Hi all, For AVX10.2 convert tests, all of them are missing mask tests previously, this patch will add them in the tests. Tested on sde with assembler with these insts. Ok for trunk? Thx, Haochen gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-512-vcvt2ps2phx-2.c: Enhance mask test.

[PATCH] i386: Add missing avx512f-mask-type.h include

2024-09-17 Thread Haochen Jiang

Hi all, Since commit r15-3594, we fixed the bugs in MASK_TYPE for AVX10.2 testcases, but we missed the following four. The tests are not FAIL since the binutils part haven't been merged yet, which leads to UNSUPPORTED test. But the avx512f-mask-type.h needs to be included, otherwise, it will be c

[PATCH] doc: Add more alias option and reorder Intel CPU -march documentation

2024-09-17 Thread Haochen Jiang

Hi all, Since r15-3539, there are requests coming in to add other alias option documentation. This patch will add all ot them, including corei7, corei7-avx, core-avx-i, core-avx2, atom, slm, gracemont and emerarldrapids. Also in the patch, I reordered that part of documentation, currently all the

[PATCH] doc: Enhance Intel CPU documentation

2024-09-05 Thread Haochen Jiang

Hi all, This patch will add those recent aliased CPU names into documentation for clearness. Ready to push for trunk and backport to GCC14 and part of the patch to GCC13 as an obvious fix if no objection. Thx, Haochen gcc/ChangeLog: PR target/116617 * doc/invoke.texi: Add meteo

[PATCH] i386: Fix incorrect avx512f-mask-type.h include

2024-09-04 Thread Haochen Jiang

Hi all, In avx512f-mask-type.h, we need SIZE being defined to get MASK_TYPE defined correctly. Fix those testcases where SIZE are not defined before the include for avv512f-mask-type.h. Note that for convert intrins in AVX10.2, they will need more modifications due to the current tests did not in

[PATCH] i386: Fix vfpclassph non-optimizied intrin

2024-09-02 Thread Haochen Jiang

Hi all, The intrin for non-optimized got a typo in mask type, which will cause the high bits of __mmask32 being unexpectedly zeroed. The test does not fail under O0 with current 1b since the testcase is wrong. We need to include avx512-mask-type.h after SIZE is defined, or it will always be __mma

[gcc-wwwdocs PATCH] gcc-15: Mention recent update for x86_64 backend

2024-08-27 Thread Haochen Jiang

Hi all, Sorry for the disturb since I mis-typoed gcc-patches to gcc-patchs, resend the patch. This patch will add documentation for recent update in x86-64 backend. Ok for wwwdocs trunk? Thx, Haochen --- Mention AVX10.2 support and Xeon Phi removal in GCC 15. --- htdocs/gcc-15/changes.html

[PATCH 4/8] i386: Support vectorized BF16 add/sub/mul/div with AVX10.2 instructions

2024-08-25 Thread Haochen Jiang

From: Levy Hsu AVX10.2 introduces several non-exception instructions for BF16 vector. Enable vectorized BF add/sub/mul/div operation by supporting standard optab for them. gcc/ChangeLog: * config/i386/sse.md (div3): New expander for BFmode div. (VF_BHSD): New mode iterator with

1 2 3 >

1 - 100 of 279 matches

Mail list logo