> It would be nice to add to the documentation that INSN_BASE_REG_CLASS,
> INSN_INDEX_REG_CLASS, and REGNO_OK_FOR_INSN_BASE_P if defined have
> priority over older corresponding macros as it is already documented for
> REGNO_MODE_CODE_OK_FOR_BASE_P relating to REGNO_OK_FOR_BASE_P. But this
> small
Since -mapxf works similar as -muintr that will emit error for 32bit
target, add !ia32 target guard for apx related tests.
Committed as obvious fix after test.
gcc/testsuite/ChangeLog:
* gcc.target/i386/apx-egprs-names.c: Compile for non-ia32.
* gcc.target/i386/apx-inline-gpr-nor
For vec_concatv2di, m constraint in alternative 0 and 1 could result in
egpr allocated on operand 2 under -mapxf. Should use jm instead.
Bootstrapped/regtested on x86-64-linux-gnu.
Ok for trunk?
gcc/ChangeLog:
* config/i386/sse.md (vec_concatv2di): Replace constraint "m"
with "j
Thanks, also there is another pattern missed that should use "ja" instead
of Bm. Will commit below changes.
gcc/ChangeLog:
* config/i386/sse.md (vec_concatv2di): Replace constraint "m"
with "jm" for alternative 0 and 1 of operand 2.
(sse4_1_3): Replace constraint "Bm" with
: New test.
* gcc.target/i386/apx-push2pop2_force_drap-1.c: Likewise.
* gcc.target/i386/apx-push2pop2_interrupt-1.c: Likewise.
Co-authored-by: Hu Lin1
Co-authored-by: Hongyu Wang
---
gcc/config/i386/i386.cc | 252 --
gcc/config/i3
Thanks for the fix and refinement!
I think the addr attr looks more reasonable, just one small issue that
EGPR was not only encoded with REX2 prefix, there are several
instructions that encode EGPR using evex prefix. So I think
addr_rex2/addr_rex may be a misleading note. I'd prefer still using
gp
For sure.
Jeff Law 于2020年1月15日周三 上午4:48写道:
>
> On Tue, 2019-12-24 at 13:31 +0800, Hongyu Wang wrote:
> > Hi:
> > For avx512f scalar instructions, current builtin function like
> > __builtin_ia32_*{sd,ss}_round can be replaced by
> > __builtin_ia32_*{sd,ss}_mask_ro
,
__builtin_ia32_vfmaddsd3_round,
__builtin_ia32_vfmaddss3_round): Remove.
*gcc.target/i386/sse-13.c: Ditto.
*gcc.target/i386/sse-23.c: Ditto.
Regards,
Hongyu Wang
From 9cc4928aad5770c53ff580f5c996092cdaf2f9ba Mon Sep 17 00:00:00 2001
From: hongyuw1
Date: Wed, 18 Dec 2019 14:52:54 +
Subject
The latest APX spec announced removal of SHA/KEYLOCKER evex promotion [1],
which means the SHA/KEYLOCKER insn does not support EGPR when APX
enabled. Update the corresponding constraints to their EGPR-disabled
counterparts.
Bootstrapped and regtested on x86-64-pc-linux-gnu.
Ok for trunk?
[1].htt
Thanks for fixing this! Didn't notice that the pointer conversion can
cause this issue...
Was it possible to use local array like
char a[64] = (char *)p
__asm__ volatile ("ldtilecfg\t%X0" :: "m" (a)));
If not, for the two patterns we can use "m" instead of "jm" as APX
supports EGPR extension for
Thanks, this is the patch I'm going to check-in
Hongtao Liu 于2024年1月10日周三 16:02写道:
>
> On Tue, Jan 9, 2024 at 3:09 PM Hongyu Wang wrote:
> >
> > Hi,
> >
> > For APX, the inline asm behavior was not mentioned in any document
> > before. Add description
I'm going to check-in this if no objection
Hongyu Wang 于2024年1月9日周二 15:14写道:
>
> Hi,
>
> This patch adds missing description for inline asm behavior and related
> compiler switch for APX.
>
> Ok for gcc-wwwdocs?
>
> ---
> htdocs/gcc-14/changes.html | 6 +++
Hi,
Currently move_max follows the tuning feature first, but ideally it
should sync with prefer-vector-width when it is explicitly set to keep
vector move and operation with same vector size.
Bootstrapped/regtested on x86-64-pc-linux-gnu{-m32,}
OK for trunk?
gcc/ChangeLog:
PR target/11
Hi,
As Coudert points out, this test fails on darwin as it does not
support _Decimal64, so require dfp for it.
Pushed as obvious fix.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr112943.c: Require dfp.
---
gcc/testsuite/gcc.target/i386/pr112943.c | 2 +-
1 file changed, 1 insertion(+)
Hi,
The supported sub-features for APX was missing in option document and
target attribute section. Add those missing ones.
Ok for trunk?
gcc/ChangeLog:
* config/i386/i386.opt: Add supported sub-features.
* doc/extend.texi: Add description for target attribute.
---
gcc/config/i
Hi,
For APX, the inline asm behavior was not mentioned in any document
before. Add description for it.
Ok for trunk?
gcc/ChangeLog:
* config/i386/i386.opt: Adjust document.
* doc/invoke.texi: Add description for
-mapx-inline-asm-use-gpr32.
---
gcc/config/i386/i386.opt |
Hi,
This patch adds missing description for inline asm behavior and related
compiler switch for APX.
Ok for gcc-wwwdocs?
---
htdocs/gcc-14/changes.html | 6 ++
1 file changed, 6 insertions(+)
diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index e3a68998..73a90d30 1006
From: Kong Lingling
For *one_cmplsi2_2_zext, it will be splitted to xor, so its NDD form will be
added together with xor NDD support.
gcc/ChangeLog:
* config/i386/i386.md (one_cmpl2): Add new constraints for NDD
and adjust output template.
(*one_cmpl2_1): Likewise.
linux-gnu{-m32,} and sde.
OK for trunk?
Hongyu Wang (8):
[APX NDD] Restrict TImode register usage when NDD enabled
[APX NDD] Disable seg_prefixed memory usage for NDD add
[APX NDD] Support APX NDD for left shift insns
[APX NDD] Support APX NDD for right shift insns
[APX NDD] Support APX ND
From: Kong Lingling
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_expand_unary_operator): Add use_ndd
parameter and adjust for NDD.
* config/i386/i386-protos.h: Add use_ndd parameter for
ix86_unary_operator_ok and ix86_expand_unary_operator.
* config/i
Under APX NDD, previous TImode allocation will have issue that it was
originally allocated using continuous pair, like rax:rdi, rdi:rdx.
This will cause issue for all TImode NDD patterns. For NDD we will not
assume the arithmetic operations like add have dependency between dest
and src1, then writ
From: Kong Lingling
gcc/ChangeLog:
* config/i386/i386.md: (addsi_1_zext): Add new alternatives for
NDD and adjust output templates.
(*add_2): Likewise.
(*addsi_2_zext): Likewise.
(*add_3): Likewise.
(*addsi_3_zext): Likewise.
(*adddi_4): Li
gcc/ChangeLog:
* config/i386/i386.md (*movcc_noc): Extend with new constraints
to support NDD.
(*movsicc_noc_zext): Likewise.
(*movsicc_noc_zext_1): Likewise.
(*movqicc_noc): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/i386/apx-ndd-cmov.c: New
NDD uses evex prefix, so when segment prefix is also applied, the instruction
could excceed its 15byte limit, especially adding immediates. This could happen
when "e" constraint accepts any UNSPEC_TPOFF/UNSPEC_NTPOFF constant and it will
add the offset to segment register, which will be encoded usi
From: Kong Lingling
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_fixup_binary_operands_no_copy):
Add use_ndd parameter and parse it.
* config/i386/i386-protos.h (ix86_fixup_binary_operands_no_copy):
Change define.
* config/i386/i386.md (sub3): Add new
Similar to LSHIFT, rshift do not need to omit $1 for NDD form.
gcc/ChangeLog:
* config/i386/i386.md (ashr3_cvt): Extend with new
alternatives to support NDD, and adjust output templates.
(*ashr3_1): Likewise for SI/DI mode.
(*lshr3_1): Likewise.
(*si3_1_zex
For shld/shrd insns, the old pattern use match_dup 0 as its shift src and use
+r*m as its constraint. To support NDD we added new define_insns to handle NDD
form pattern with extra input and dest operand to be fixed in register.
gcc/ChangeLog:
* config/i386/i386.md (x86_64_shld_ndd): New
gcc/ChangeLog:
* config/i386/i386.md (*3_1): Extend with a new
alternative to support NDD for SI/DI rotate, and adjust output
template.
(*si3_1_zext): Likewise.
(*3_1): Likewise for QI/HI modes.
(rcrsi2): Likewise, and use nonimmediate_operand for op
For left shift, there is an optimization TARGET_DOUBLE_WITH_ADD that shl
1 can be optimized to add. As NDD form of add requires src operand to
be register since NDD cannot take 2 memory src, we currently just keep
using NDD form shift instead of add.
The optimization TARGET_SHIFT1 will try to remo
From: Kong Lingling
Similar to AND insn, two splitters need to be adjusted to prevent
misoptimizaiton for NDD OR/XOR.
Also adjust *one_cmplsi2_2_zext and its corresponding splitter that will
generate xor insn.
gcc/ChangeLog:
* config/i386/i386.md (3): Add new alternative for NDD
From: Kong Lingling
APX NDD provides an extra destination register operand for several gpr
related legacy insns, so a new alternative can be adopted to operand1
with "r" constraint.
This first patch supports NDD for add instruction, and keeps to use lea
when all operands are registers since lea
From: Kong Lingling
Similar to *add3_doubleword, operands[1] may not equal to operands[0] so
extra move is required.
gcc/ChangeLog:
* config/i386/i386.md (*sub3_doubleword): Add new alternative for
NDD, and emit move when operands[0] not equal to operands[1].
(*sub3_doub
From: Kong Lingling
Legacy adc patterns are commonly adopted to TImode add, when extending TImode
add to NDD version, operands[0] and operands[1] can be different, so extra move
should be emitted if those patterns have optimization when adding const0_rtx.
NDD instructions will automatically zero
From: Kong Lingling
For NDD form AND insn, there are three splitter fixes after extending legacy
patterns.
1. APX NDD does not support high QImode registers like ah, bh, ch, dh, so for
some optimization splitters that generates highpart zero_extract for QImode
need to be prohibited under NDD pat
For TImode shifts, they are splitted by splitter functions, which assume
operands[0] and operands[1] to be the same. For the NDD alternative the
assumption may not be true so add split functions for NDD to emit the NDD
form instructions, and omit the handling of !64bit target split.
Although the N
Uros Bizjak 于2023年12月5日周二 18:46写道:
>
> On Tue, Dec 5, 2023 at 3:29 AM Hongyu Wang wrote:
> >
> > Under APX NDD, previous TImode allocation will have issue that it was
> > originally allocated using continuous pair, like rax:rdi, rdi:rdx.
> >
> > This will cau
) == ISA_APX_NDD instead of
checking alternative at asm output stage.
Bootstrapped & regtested on x86_64-pc-linux-gnu{-m32,} and sde.
Ok for master?
Hongyu Wang (7):
[APX NDD] Disable seg_prefixed memory usage for NDD add
[APX NDD] Support APX NDD for left shift insns
[APX NDD] Support
From: Kong Lingling
For *one_cmplsi2_2_zext, it will be splitted to xor, so its NDD form will be
added together with xor NDD support.
gcc/ChangeLog:
* config/i386/i386.md (one_cmpl2): Add new constraints for NDD
and adjust output template.
(*one_cmpl2_1): Likewise.
From: Kong Lingling
APX NDD provides an extra destination register operand for several gpr
related legacy insns, so a new alternative can be adopted to operand1
with "r" constraint.
This first patch supports NDD for add instruction, and keeps to use lea
when all operands are registers since lea
NDD uses evex prefix, so when segment prefix is also applied, the instruction
could excceed its 15byte limit, especially adding immediates. This could happen
when "e" constraint accepts any UNSPEC_TPOFF/UNSPEC_NTPOFF constant and it will
add the offset to segment register, which will be encoded usi
From: Kong Lingling
Similar to AND insn, two splitters need to be adjusted to prevent
misoptimizaiton for NDD OR/XOR.
Also adjust *one_cmplsi2_2_zext and its corresponding splitter that will
generate xor insn.
gcc/ChangeLog:
* config/i386/i386.md (3): Add new alternative for NDD
From: Kong Lingling
Legacy adc patterns are commonly adopted to TImode add, when extending TImode
add to NDD version, operands[0] and operands[1] can be different, so extra move
should be emitted if those patterns have optimization when adding const0_rtx.
For TImode insn, there could be register
From: Kong Lingling
Similar to *add3_doubleword, operands[1] may not equal to operands[0] so
extra move and earlyclobber are required.
gcc/ChangeLog:
* config/i386/i386.md (*sub3_doubleword): Add new alternative for
NDD, adopt '&' modifier to NDD dest and emit move when operands
From: Kong Lingling
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_expand_unary_operator): Add use_ndd
parameter and adjust for NDD.
* config/i386/i386-protos.h: Add use_ndd parameter for
ix86_unary_operator_ok and ix86_expand_unary_operator.
* config/i
From: Kong Lingling
For NDD form AND insn, there are three splitter fixes after extending legacy
patterns.
1. APX NDD does not support high QImode registers like ah, bh, ch, dh, so for
some optimization splitters that generates highpart zero_extract for QImode
need to be prohibited under NDD pat
gcc/ChangeLog:
* config/i386/i386.md (*movcc_noc): Extend with new constraints
to support NDD.
(*movsicc_noc_zext): Likewise.
(*movsicc_noc_zext_1): Likewise.
(*movqicc_noc): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/i386/apx-ndd-cmov.c: New
For shld/shrd insns, the old pattern use match_dup 0 as its shift src and use
+r*m as its constraint. To support NDD we added new define_insns to handle NDD
form pattern with extra input and dest operand to be fixed in register.
gcc/ChangeLog:
* config/i386/i386.md (x86_64_shld_ndd): New
From: Kong Lingling
gcc/ChangeLog:
* config/i386/i386.md: (addsi_1_zext): Add new alternatives for
NDD and adjust output templates.
(*add_2): Likewise.
(*addsi_2_zext): Likewise.
(*add_3): Likewise.
(*addsi_3_zext): Likewise.
(*adddi_4): Li
For left shift, there is an optimization TARGET_DOUBLE_WITH_ADD that shl
1 can be optimized to add. As NDD form of add requires src operand to
be register since NDD cannot take 2 memory src, we currently just keep
using NDD form shift instead of add.
The optimization TARGET_SHIFT1 will try to remo
Similar to LSHIFT, rshift do not need to omit $1 for NDD form.
gcc/ChangeLog:
* config/i386/i386.md (ashr3_cvt): Extend with new
alternatives to support NDD, and adjust output templates.
(*ashr3_1): Likewise for SI/DI mode.
(*lshr3_1): Likewise.
(*si3_1_zex
From: Kong Lingling
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_fixup_binary_operands_no_copy):
Add use_ndd parameter and parse it.
* config/i386/i386-protos.h (ix86_fixup_binary_operands_no_copy):
Change define.
* config/i386/i386.md (sub3): Add new
For TImode shifts, they are splitted by splitter functions, which assume
operands[0] and operands[1] to be the same. For the NDD alternative the
assumption may not be true so add split functions for NDD to emit the NDD
form instructions, and omit the handling of !64bit target split.
Although the N
gcc/ChangeLog:
* config/i386/i386.md (*3_1): Extend with a new
alternative to support NDD for SI/DI rotate, and adjust output
template.
(*si3_1_zext): Likewise.
(*3_1): Likewise for QI/HI modes.
(rcrsi2): Likewise, and use nonimmediate_operand for op
Hi,
The ashl/lshr/ashr expanders calls ix86_expand_binary_operator, while
they will be called for some post-reload split, and TARGET_APX_NDD is
required for these calls to avoid force-load to memory at postreload
stage.
Bootstrapped/regtested on x86-64-pc-linux-gnu{-m32,}
Ok for master?
gcc/Cha
> > +__int128 u128_2 = (9223372036854775808 << 4) * foo0_u8_0; /* {
> > dg-warning "integer constant is so large that it is unsigned" "so large" }
> > */
>
> Just you can use (9223372036854775807LL + (__int128) 1) instead of
> 9223372036854775808
> to avoid the warning.
> The testcase will I
Hi,
When APX EGPR enabled, the TImode move pattern *movti_internal allows
move between gpr and sse reg using constraint pair ("r","Yd"). Then a
post-reload splitter transform such move to vec_extractv2di, while under
-msse4.1 -mno-avx EGPR is not allowed for its enabled alternative, which
caused I
Hi,
For vextract/insert{if}128 they cannot adopt EGPR in their memory operand, all
related pattern should be adjusted to disable EGPR usage on them.
Also fix a wrong gpr16 attr for insertps.
Bootstrapped/regtested on x86-64-pc-linux-gnu{-m32,}
Ok for master?
gcc/ChangeLog:
* config/i38
Under APX NDD, previous TImode allocation will have issue that it was
originally allocated using continuous pair, like rax:rdi, rdi:rdx.
This will cause issue for all TImode NDD patterns. For NDD we will not
assume the arithmetic operations like add have dependency between dest
and src1, then writ
atches are basic NDD supports. In the future we will
continuously support NDD optimizations.
Bootstrapped/regtested on x86-64-pc-linux{-m32,} and SDE, also passed SPEC sde
simulation run.
Hongyu Wang (7):
[APX NDD] Restrict TImode register usage when NDD enabled
[APX NDD] Disable seg_prefixed
NDD uses evex prefix, so when segment prefix is also applied, the instruction
could excceed its 15byte limit, especially adding immediates. This could happen
when "e" constraint accepts any UNSPEC_TPOFF/UNSPEC_NTPOFF constant and it will
add the offset to segment register, which will be encoded usi
From: Kong Lingling
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_fixup_binary_operands_no_copy):
Add use_ndd parameter.
(ix86_can_use_ndd_p): ADD MINUS.
* config/i386/i386-protos.h (ix86_fixup_binary_operands_no_copy):
Change define.
* config/
From: Kong Lingling
APX NDD provides an extra destination register operand for several gpr
related legacy insns, so a new alternative can be adopted to operand1
with "r" constraint.
This first patch supports NDD for add instruction, and keeps to use lea
when all operands are registers since lea
From: Kong Lingling
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_can_use_ndd_p): Add NEG
support.
(ix86_expand_unary_operator): Add use_ndd parameter and adjust for NDD.
* config/i386/i386-protos.h : Add use_ndd parameter for
ix86_unary_operator_ok an
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_can_use_ndd_p): Add ROTATE
and ROTATERT.
* config/i386/i386.md (*3_1): Extend with a new
alternative to support NDD for SI/DI rotate, and adjust output
template.
(*si3_1_zext): Likewise.
(*3_1
From: Kong Lingling
gcc/ChangeLog:
* config/i386/i386.md: (addsi_1_zext): Add new alternatives for NDD and
adjust output templates.
(*add_2): Likewise.
(*addsi_2_zext): Likewise.
(*add_3): Likewise.
(*addsi_3_zext): Likewise.
(*adddi_4): Li
From: Kong Lingling
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_can_use_ndd_p): Add NOT
support.
* config/i386/i386.md (one_cmpl2): Add NDD constraints, adjust
output template.
(*one_cmpl2_1): Likewise.
(*one_cmplqi2_1): Likewise.
(*o
For left shift, there is an optimization TARGET_DOUBLE_WITH_ADD that shl
1 can be optimized to add. As NDD form of add requires src operand to
be register since NDD cannot take 2 memory src, we currently just keep
using NDD form shift instead of add.
The optimization TARGET_SHIFT1 will try to remo
From: Kong Lingling
Legacy adc patterns are commonly adopted to TImode add, when extending TImode
add to NDD version, operands[0] and operands[1] can be different, so extra move
should be emitted if those patterns have optimization when adding const0_rtx.
gcc/ChangeLog:
* config/i386/i3
From: Kong Lingling
Similar to *add3_doubleword, operands[1] may not equal to operands[0] so
extra move is required.
gcc/ChangeLog:
* config/i386/i386.md (*sub3_doubleword): Add ndd constraints, and
emit move when operands[0] not equal to operands[1].
(*sub3_doubleword_z
gcc/ChangeLog:
* config/i386/i386.md (*movcc_noc): Extend with new constraints
to support NDD.
(*movsicc_noc_zext): Likewise.
(*movsicc_noc_zext_1): Likewise.
(*movqicc_noc): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/i386/apx-ndd-cmov.c: New
From: Kong Lingling
For NDD form AND insn, there are three splitter fixes after extending legacy
patterns.
1. APX NDD does not support high QImode registers like ah, bh, ch, dh, so for
some optimization splitters that generates highpart zero_extract for QImode
need to be prohibited under NDD pat
From: Kong Lingling
Similar to AND insn, two splitters need to be adjusted to prevent
misoptimizaiton for NDD OR/XOR.
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_can_use_ndd_p): Add IOR/XOR
support.
* config/i386/i386.md (3): Add NDD alternative and adjust
For shld/shrd insns, the old pattern use match_dup 0 as its shift src and use
+r*m as its constraint. To support NDD we added new define_insns to handle NDD
form pattern with extra input and dest operand to be fixed in register.
gcc/ChangeLog:
* config/i386/i386.md (x86_64_shld_ndd): New
Similar to LSHIFT, rshift should also emit $1 for NDD form with CX_REG as
operands[1].
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_can_use_ndd_p): Add LSHIFTRT
and RSHIFTRT.
* config/i386/i386.md (ashr3_cvt): Extend with new
alternatives to support NDD, and a
Intel APX PPX feature has been released in [1].
PPX stands for Push-Pop Acceleration. PUSH/PUSH2 and its corresponding POP
can be marked with a 1-bit hint to indicate that the POP reads the
value written by the PUSH from the stack. The processor tracks these marked
instructions internally and fast
anks for the suggestion.
Updated patch with just 1 new UNSPEC and removed cfa handling.
Hongtao Liu 于2023年11月20日周一 14:46写道:
>
> On Fri, Nov 17, 2023 at 3:26 PM Hongyu Wang wrote:
> >
> > Intel APX PPX feature has been released in [1].
> >
> > PPX stands for Push-Pop Accelera
Hi,
The push2/pop2 operand order does not match the binutils implementation
for AT&T syntax that it will first push operands[2] then operands[1].
Correct it by reverse operand order for AT&T syntax.
Bootstrapped/regtested on x86-64-linux-pc-gnu{-m32,}
Ok for master?
gcc/ChangeLog:
* co
Hi,
On linux x86-64, -fomit-frame-pointer was by default enabled so the
push2pop2 tests cfi scans are based on it. On other target with
-fno-omit-frame-pointer the cfi scan will be wrong as the frame pointer
is pushed at first. Add -fomit-frame-pointer to these tests that related
to cfi scan.
OK
h previous constraints.
3. Support constraint mapping for all gpr related common constraints in
inline asm.
Bootstrapped/regtested x86_64-linux-gnu.
Ok for trunk?
Hongyu Wang (2):
[APX EGPR] middle-end: Add index_reg_class with insn argument.
[APX EGPR] Handle GPR16 only vector move i
Like base_reg_class, INDEX_REG_CLASS also does not support backend insn.
Add index_reg_class with insn argument for lra/reload usage.
gcc/ChangeLog:
* addresses.h (index_reg_class): New wrapper function like
base_reg_class.
* doc/tm.texi: Document INSN_INDEX_REG_CLASS.
traint.
(jp): Likewise for "p" constraint.
* config/i386/i386.h (enum reg_class): Add new reg class
GENERAL_GPR16.
Co-authored-by: Hongyu Wang
Co-authored-by: Hongtao Liu
---
gcc/config/i386/constraints.md | 59 +-
gcc/config/i386/i386.h
s to EGPR prohibited constraints.
(ix86_md_asm_adjust): Calls map_egpr_constraints.
* config/i386/i386.opt: Add option mapx-inline-asm-use-gpr32.
gcc/testsuite/ChangeLog:
* gcc.target/i386/apx-inline-gpr-norex2.c: New test.
Co-authored-by: Hongyu Wang
Co-authored-by: Hon
.
(apx_none): New enum value.
(apx_egpr): Likewise.
(apx_push2pop2): Likewise.
(apx_ndd): Likewise.
(apx_all): Likewise.
* doc/invoke.texi: Document mapxf.
gcc/testsuite/ChangeLog:
* gcc.target/i386/apx-1.c: New test.
Co-authored-by: Hongyu
): Ditto.
* reload1.cc (maybe_fix_stack_asms): Ditto.
Co-authored-by: Hongyu Wang
Co-authored-by: Hongtao Liu
---
gcc/addresses.h| 19 +++
gcc/doc/tm.texi| 14 ++
gcc/doc/tm.texi.in | 14 ++
gcc/lra-constraints.cc | 15
): Likewise.
(aesimc): Likewise.
(aeskeygenassist): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/i386/apx-legacy-insn-check-norex2.c: Add intrinsic
tests.
Co-authored-by: Hongyu Wang
Co-authored-by: Hongtao Liu
---
gcc/config/i386/i386-protos.h
-names.c: New test.
* gcc.target/i386/apx-spill_to_egprs-1.c: Likewise.
* gcc.target/i386/apx-interrupt-1.c: Likewise.
Co-authored-by: Hongyu Wang
Co-authored-by: Hongtao Liu
---
gcc/config/i386/i386-protos.h | 1 +
gcc/config/i386/i386.cc
t): Likewise.
(pclmulqdq): Likewise.
(vgf2p8affineinvqb_): Likewise.
(vgf2p8affineqb_): Likewise.
(vgf2p8mulb_): Likewise.
Co-authored-by: Hongyu Wang
Co-authored-by: Hongtao Liu
---
gcc/config/i386/i386.md | 42 +++---
gcc/config/i386/mmx.md | 143 +
.
(INSN_INDEX_REG_CLASS): Likewise.
(enum reg_class): Add INDEX_GPR16.
(GENERAL_GPR16_REGNO_P): Define.
* config/i386/i386.md (gpr32): New attribute.
Co-authored-by: Hongyu Wang
Co-authored-by: Hongtao Liu
---
gcc/config/i386/i386-protos.h | 3 ++
gcc/config/i386/i386.cc
intrinsic tests.
Co-authored-by: Hongyu Wang
Co-authored-by: Hongtao Liu
---
gcc/config/i386/sse.md| 73
.../i386/apx-legacy-insn-check-norex2.c | 106 ++
2 files changed, 155 insertions(+), 24 deletions(-)
diff --git a/gcc/config/i386
For vector move insns like vmovdqa/vmovdqu, their evex counterparts
requrire explicit suffix 64/32/16/8. The usage of these instruction
are prohibited under AVX10_1 or AVX512F, so for we select
vmovaps/vmovups for vector load/store insns that contains EGPR if
ther is no AVX512VL, and keep the origi
et its constraint to jm and set attr_gpr32 to 0.
(vec_set_lo_): Likewise.
(vec_set_lo_): Likewise for SF/SI modes.
(vec_set_hi_): Likewise.
(vec_set_hi_): Likewise for SF/SI modes.
(vec_set_hi_): Likewise.
(vec_set_lo_): Likewise.
(avx2_set_hi_v32qi
:
* lib/target-supports.exp: Add apxf check.
* gcc.target/i386/apx-legacy-insn-check-norex2.c: New test.
* gcc.target/i386/apx-legacy-insn-check-norex2-asm.c: New assembler
test.
Co-authored-by: Hongyu Wang
Co-authored-by: Hongtao Liu
---
gcc/config/i386/i386.md
Hi,
According to APX spec, the pushp/popp pairs should be matched,
otherwise the PPX hint cannot take effect and cause performance loss.
In the ix86_expand_epilogue, there are several optimizations that may
cause the epilogue using mov to restore the regs. Check if PPX applied
and prevent usage o
apx spec, the mismatched
pushp/popp pair does confused the fast-forwarding logic and turns off
the PPX optimization. We just need to make sure every pushp for a
certain reg has corresponding popp for that reg.
Richard Biener 于2024年7月2日周二 16:18写道:
>
> On Tue, Jul 2, 2024 at 5:24 AM Hongyu Wan
Hi,
For APX ccmp, current infrastructure will always generate cstore for
the ccmp flag user, like
cmpe%rcx, %r8
ccmpnel %rax, %rbx
seta%dil
add %rcx, %r9
add %r9, %rdx
testb %dil, %dil
je .L2
For such case, the legacy
Hi,
According to the instruction spec of AVX512BF16, the convert from float
to BF16 is not a simple truncation. It has special handling for
denormal/nan, even for normal float it will add an extra bias according
to the least significant bit for bf number. This means we cannot use the
vcvtne2ps2bf1
> Could you just git revert 6d0b7b69d143025f271d0041cfa29cf26e6c343b?
We can still deal with BFmode permutation the same way as HFmode, so
the change in ix86_vectorize_vec_perm_const can be preserved.
Hongtao Liu 于2024年7月15日周一 09:40写道:
>
> On Sat, Jul 13, 2024 at 3:44 PM Hongyu Wa
html
Hongyu Wang (3):
[APX CCMP] Support APX CCMP
[APX CCMP] Adjust startegy for selecting ccmp candidates
[APX CCMP] Support ccmp for float compare
gcc/ccmp.cc| 12 +-
gcc/config/i386/i386-expand.cc | 164 +
gcc/config/i386/
The ccmp insn itself doesn't support fp compare, but x86 has fp comi
insn that changes EFLAG which can be the scc input to ccmp. Allow
scalar fp compare in ix86_gen_ccmp_first except ORDERED/UNORDERD
compare which can not be identified in ccmp.
gcc/ChangeLog:
* config/i386/i386-expand.cc
For general ccmp scenario, the tree sequence is like
_1 = (a < b)
_2 = (c < d)
_3 = _1 & _2
current ccmp expanding will try to swap compare order for _1 and _2,
compare the cost/cost2 between compare _1 and _2 first, then return the
sequence with lower cost.
For x86 ccmp, we don't support FP com
1 - 100 of 304 matches
Mail list logo