And split it to GPR-version instruction after reload.
This will enable below optimization for 16/32/64-bit vector bit_op
- movd(%rdi), %xmm0
- movd(%rsi), %xmm1
- pand%xmm1, %xmm0
- movd%xmm0, (%rdi)
+ movl(%rsi), %eax
+ andl%eax, (%rdi)
The patch only handles load/store(including ctor/permutation, except
gather/scatter) for complex type, other operations don't needs to be
handled since they will be lowered by pass cplxlower.(MASK_LOAD is not
supported for complex type, so no need to handle either).
Instead of support vector(2) _C
And split it to GPR-version instruction after reload.
> ?r was introduced under the assumption that we want vector values
> mostly in vector registers. Currently there are no instructions with
> memory or immediate operand, so that made sense at the time. Let's
> keep ?r until logic instructions w
And split it after reload.
>IMO, the only case it is worth adding is a direct immediate store to
>memory, which HJ recently added.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?
gcc/ChangeLog:
PR target/106038
* config/i386/mmx.md (3): Extend to AND mem,
V2 update:
Handle VMAT_ELEMENTWISE, VMAT_CONTIGUOUS_PERMUTE, VMAT_STRIDED_SLP,
VMAT_CONTIGUOUS_REVERSE, VMAT_CONTIGUOUS_DOWN for complex type.
I've run SPECspeed@2017 627.cam4_s, there's some vectorization cases,
but no big performance impact(since this patch only handle load/store).
Any co
And split it after reload.
> You will need ix86_binary_operator_ok insn constraint here with
> corresponding expander using ix86_fixup_binary_operands_no_copy to
> prepare insn operands.
Split define_expand with just register_operand, and allow
memory/immediate in define_insn, assume combine/forwp
__builtin_cexpi can't be vectorized since there's gap between it and
vectorized sincos version(In libmvec, it passes a double and two
double pointer and returns nothing.) And it will lose some
vectorization opportunity if sin & cos are optimized to cexpi before
vectorizer.
I'm trying to add vect_r
> My original comments still stand (it feels like this should be more generic).
> Can we go the way lowering complex loads/stores first? A large part
> of the testcases
> added by the patch should pass after that.
This is the patch as suggested, one additional change is handling COMPLEX_CST
for r
And split it after reload.
gcc/ChangeLog:
PR target/106038
* config/i386/mmx.md (3): New define_expand, it's
original "3".
(*3): New define_insn, it's original
"3" be extended to handle memory and immediate
operand with ix86_binary_operator_ok. Also
r13-1762-gf9d4c3b45c5ed5f45c8089c990dbd4e181929c3d lower complex type
move to scalars, but testcase pr23911 is supposed to scan __complex__
constant which is never available, so adjust testcase to scan
IMAGPART/REALPART_EXPR constants separately.
Pushed as obvious patch.
gcc/testsuite/ChangeLog
For neg, the patch create a vec_init as [ a, -a, a, -a, ... ] and no
vec_step is needed to update vectorized iv since vf is always multiple
of 2(negative * negative is positive).
For shift, the patch create a vec_init as [ a, a >> c, a >> 2*c, ..]
as vec_step as [ c * nunits, c * nunits, c * nuni
/var/tmp/portage/sys-devel/gcc-14.0.0_pre20230806/work/gcc-14-20230806/libgfortran/generated/matmul_i1.c:
In function ‘matmul_i1_avx512f’:
/var/tmp/portage/sys-devel/gcc-14.0.0_pre20230806/work/gcc-14-20230806/libgfortran/generated/matmul_i1.c:1781:1:
internal compiler error: RTL check: expected
Similar like r14-2786-gade30fad6669e5, the patch is for V4HF/V2HFmode.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?
gcc/ChangeLog:
PR target/110762
* config/i386/mmx.md (3): Changed from define_insn
to define_expand and break into ..
(v4
Don't access leaf 7 subleaf 1 unless subleaf 0 says it is
supported via EAX.
Intel documentation says invalid subleaves return 0. We had been
relying on that behavior instead of checking the max sublef number.
It appears that some Sandy Bridge CPUs return at least the subleaf 0
EDX value for subl
> Please rather do it in a more self-descriptive way, as proposed in the
> attached patch. You won't need a comment then.
>
Adjusted in V2 patch.
Don't access leaf 7 subleaf 1 unless subleaf 0 says it is
supported via EAX.
Intel documentation says invalid subleaves return 0. We had been
relying
This minor fix is preapproved in [1].
Committed to trunk.
[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626758.html
gcc/ChangeLog:
* common/config/i386/cpuinfo.h (get_available_features):
Rename local variable subleaf_level to max_subleaf_level.
---
gcc/common/config
Also add ix86_partial_vec_fp_math to to condition of V2HF/V4HF named
patterns in order to avoid generation of partial vector V8HFmode
trapping instructions.
Bootstrapped and regtseted on x86_64-pc-linux-gnu{-m32,}
Ok for trunk?
gcc/ChangeLog:
PR target/110832
* config/i386/mmx.md
Currently we have 3 different independent tunes for gather
"use_gather,use_gather_2parts,use_gather_4parts",
similar for scatter, there're
"use_scatter,use_scatter_2parts,use_scatter_4parts"
The patch support 2 standardizing options to enable/disable
vectorization for all gather/scatter instructio
For more details of GDS (Gather Data Sampling), refer to
https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/gather-data-sampling.html
After microcode update, there's performance regression. To avoid that,
the patch disables gather gene
Rename original use_gather to use_gather_8parts, Support
-mtune-ctrl={,^}use_gather to set/clear tune features
use_gather_{2parts, 4parts, 8parts}. Support the new option -mgather
as alias of -mtune-ctrl=, use_gather, ^use_gather.
Similar for use_scatter.
How about this version?
gcc/ChangeLog:
vmovapd can enable register renaming and have same code size as
vmovsd. Similar for vmovsh vs vmovaps, vmovaps is 1 byte less than
vmovsh.
When TARGET_AVX512VL is not available, still generate
vmovsd/vmovss/vmovsh to avoid vmovapd/vmovaps zmm16-31.
Bootstrapped and regtested on x86_64-pc-linux-gn
Alderlake-N is E-core only, add it as an alias of Alderlake.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Any comments?
gcc/ChangeLog:
* common/config/i386/cpuinfo.h (get_intel_cpu): Detect
Alderlake-N.
* common/config/i386/i386-common.cc (alias_table): Suppo
---
htdocs/gcc-14/changes.html | 4
1 file changed, 4 insertions(+)
diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index eae25f1a..2c888660 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -151,6 +151,10 @@ a work-in-progress.
-march=luna
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512f-pr88464-2.c: Add -mgather to
options.
* gcc.target/i386/avx512f-pr88464-3.c: Ditto.
* gcc.target/i386/avx512f-pr88464-4.c: Ditto.
* gcc.target/i386/avx512f-pr88464-6.c: Ditto.
* gcc.target/i386/avx51
Commit as an abvious fix.
gcc/testsuite/ChangeLog:
* gcc.target/i386/invariant-ternlog-1.c: Only scan %rdx under
TARGET_64BIT.
---
gcc/testsuite/gcc.target/i386/invariant-ternlog-1.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/testsuite/gcc.target/i386
I notice there's some refactor in vectorizable_conversion
for code_helper,so I've adjusted my patch to that.
Here's the patch I'm going to commit.
We have already use intermidate type in case WIDEN, but not for NONE,
this patch extended that.
gcc/ChangeLog:
PR target/110018
* tre
If mem_addr points to a memory region with less than whole vector size
bytes of accessible memory and k is a mask that would prevent reading
the inaccessible bytes from mem_addr, add UNSPEC_MASKLOAD to prevent
it to be transformed to vpblendd.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32
> > Hmm, good question. GENERIC has a direct truncation to unsigned char
> > for example, the C standard generally says if the integral part cannot
> > be represented then the behavior is undefined. So I think we should be
> > safe here (0x1.0p32 doesn't fit an int).
>
> We should be following An
The new assembly looks better than original one, so I adjust those testcases.
Ok for trunk?
gcc/testsuite/ChangeLog:
PR tree-optimization/110371
PR tree-optimization/110018
* gcc.target/aarch64/sve/unpack_fcvt_signed_1.c: Scan scvt +
sxtw instead of scvt + zip1 + z
When there're multiple operands in vec_oprnds0, vec_dest will be
overwrited to vectype_out, but in multi_step_cvt case, cvt_type is
expected. It caused an ICE when verify_gimple_in_cfg.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} and aarch64-linux-gnu.
Ok for trunk?
gcc/ChangeLog:
__bfloat16 is redefined from typedef short to real __bf16 since GCC
V13. The patch issues an warning for potential silent implicit
conversion between __bf16 and short where users may only expect a
data movement.
To avoid too many false positive, warning is only under
TARGET_AVX512BF16.
Bootstrapp
At the rtl level, we cannot guarantee that the maskstore is not optimized
to other full-memory accesses, as the current implementations are equivalent
in terms of pattern, to solve this potential problem, this patch refines
the pattern of the maskstore and the intrinsics with unspec.
One thing I'm
pass_insert_vzeroupper is under condition
TARGET_AVX && TARGET_VZEROUPPER
&& flag_expensive_optimizations && !optimize_size
But the document of mvzeroupper doesn't mention the insertion
required -O2 and above, it may confuse users when they explicitly
use -Os -mvzeroupper.
mvzeroupp
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?
gcc/ChangeLog:
PR target/82735
* config/i386/i386.cc (ix86_avx_u127_mode_needed): Don't emit
vzeroupper for vzeroupper call_insn.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-vzeroupper-30.
vpternlog is also used for optimization which doesn't need any valid
input operand, in that case, the destination is used as input in the
instruction and that creates a false dependence.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready to push to trunk.
gcc/ChangeLog:
PR t
For testcase
void __cond_swap(double* __x, double* __y) {
bool __r = (*__x < *__y);
auto __tmp = __r ? *__x : *__y;
*__y = __r ? *__y : *__x;
*__x = __tmp;
}
GCC-14 with -O2 and -march=x86-64 options generates the following code:
__cond_swap(double*, double*):
movsd xmm1, QWORD
They should have same cost as vector mode since both generate
pand/pandn/pxor/por instruction.
Bootstrapped and regtested on x86_64-pc-linu-gnu{-m32,}.
Ok for trunk?
gcc/ChangeLog:
* config/i386/i386.cc (ix86_rtx_costs): Adjust rtx_cost for
DF/SFmode AND/IOR/XOR/ANDN operations.
We have ix86_expand_sse_fp_minmax to detect min/max sematics, but
it requires rtx_equal_p for cmp_op0/cmp_op1 and if_true/if_false, for
the testcase in the PR, there's an extra move from cmp_op0 to if_true,
and it failed ix86_expand_sse_fp_minmax.
This patch adds pre_reload splitter to detect the
> Please split the above pattern into two, one emitting UNSPEC_IEEE_MAX
> and the other emitting UNSPEC_IEEE_MIN.
Splitted.
> The test involves blendv instruction, which is SSE4.1, so it is
> pointless to test it without -msse4.1. Please add -msse4.1 instead of
> -march=x86_64 and use sse4_runtime
False dependency happens when destination is only updated by
pternlog. There is no false dependency when destination is also used
in source. So either a pxor should be inserted, or input operand
should be set with constraint '0'.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready to p
Similar like we did for cmpxchg, but extended to all
ix86_comparison_int_operator since cmpccxadd set EFLAGS exactly same
as CMP.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,},
Ok for trunk?
gcc/ChangeLog:
PR target/110591
* config/i386/sync.md (cmpccxadd_): Add a new
Here's updated patch.
1. use optimize_insn_for_speed_p instead of using optimize_function_for_speed_p.
2. explicitly move memory to dest register to avoid false dependence in
one_cmpl pattern.
False dependency happens when destination is only updated by
pternlog. There is no false dependency whe
Similar like we did for CMPXCHG, but extended to all
ix86_comparison_int_operator since CMPCCXADD set EFLAGS exactly same
as CMP.
When operand order in CMP insn is same as that in CMPCCXADD,
CMP insn can be eliminated directly.
When operand order is swapped in CMP insn, only optimize
cmpccxadd +
Antony Polukhin 2023-07-11 09:51:58 UTC
There's a typo at
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/testsuite/g%2B%2B.target/i386/pr110170.C;h=e638b12a5ee2264ecef77acca86432a9f24b103b;hb=d41a57c46df6f8f7dae0c0a8b349e734806a837b#l87
It should be `|| !test3() || !test3r()` rather than `|| !te
> The quoted patch shows -shared in context and you didn't post a
> backport version
> to look at. But yes, we shouldn't change -shared behavior on a
> branch, even less so make it
> inconsistent between targets.
Here's the patch.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?
gcc/ChangeLog:
PR target/89701
* common.opt: Refactor -fcf-protection= to support combination
of param.
* lto-wrapper.c (merge_and_complain): Adjusted.
* opts.c (parse_cf_protection_opt
> I think this could be simplified if you use either EnumSet or
> EnumBitSet instead in common.opt for `-fcf-protection=`.
Use EnumSet instead of EnumBitSet since CF_FULL is not power of 2.
It is a bit tricky for sets classification, cf_branch and cf_return
should be in different sets, but they bo
r14-172-g0368d169492017 replaces GENERAL_REGS with NO_REGS in cost
calculation when the preferred register class are not known yet.
It regressed powerpc PR109610 and PR109858, it looks too aggressive to use
NO_REGS when mode can be allocated with GENERAL_REGS.
The patch takes a step back, still use
Also for 64-bit vector abs intrinsics _mm_abs_{pi8,pi16,pi32}.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?
gcc/ChangeLog:
PR target/109900
* config/i386/i386.cc (ix86_gimple_fold_builtin): Fold
_mm{,256,512}_abs_{epi8,epi16,epi32,epi64} and
r12-5595-gc39d77f252e895306ef88c1efb3eff04e4232554 adds 2 splitter to
transform notl + pbroadcast + pand to pbroadcast + pandn for
VI124_AVX2 which leaves out all DI-element-size ones as
well as all 512-bit ones.
This patch extend the splitter to VI_AVX2 which will handle DImode for
AVX2, and V64QI
lzcnt/tzcnt has been fixed since skylake, popcnt has been fixed since
icelake. At least for icelake and later intel Core processors, the
errata tune is not needed. And the tune isn't need for ATOM either.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready to push to trunk.
gcc/Chang
Hi:
This patch supports cond_add/sub/mul/div expanders for vector float/double.
There're still cond_fma/fms/fnms/fma/max/min/xor/ior/and left which I failed
to figure out a testcase to validate them.
Also cond_add/sub/mul for vector integer.
Bootstrap is ok, survive the regression test on
gcc/ChangeLog:
* config/i386/i386-modes.def (FLOAT_MODE): Define ieee HFmode.
* config/i386/i386.c (enum x86_64_reg_class): Add
X86_64_SSEHF_CLASS.
(merge_classes): Handle X86_64_SSEHF_CLASS.
(examine_argument): Ditto.
(construct_container): Ditto.
AVX512FP16 feature and scalar _Float16
instructions.
liuhongt (5):
Update hf soft-fp from glibc.
[i386] Enable _Float16 type for TARGET_SSE2 and above.
[i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and
truncations.
Support -fexcess-precision=16 which will enable
gcc/ada/ChangeLog:
* gcc-interface/misc.c (gnat_post_options): Issue an error for
-fexcess-precision=16.
gcc/c-family/ChangeLog:
* c-common.c (excess_precision_mode_join): Update below comments.
(c_ts18661_flt_eval_method): Set excess_precision_type to
EXC
libgcc/ChangeLog
* soft-fp/eqhf2.c: New file.
* soft-fp/extendhfdf2.c: New file.
* soft-fp/extendhfsf2.c: New file.
* soft-fp/extendhfxf2.c: New file.
* soft-fp/half.h (FP_CMP_EQ_H): New marco.
* soft-fp/truncdfhf2.c: New file
* soft-fp/trunc
libgcc/ChangeLog:
* config/i386/32/sfp-machine.h (_FP_NANFRAC_H): New macro.
* config/i386/64/sfp-machine.h (_FP_NANFRAC_H): Ditto.
* config/i386/sfp-machine.h (_FP_NANSIGN_H): Ditto.
* config/i386/t-softfp: Add hf soft-fp.
* config.host: Add i386/64/t-softf
gcc/ChangeLog:
* config/i386/avx512fp16intrin.h (_mm_set_ph): New intrinsic.
(_mm256_set_ph): Likewise.
(_mm512_set_ph): Likewise.
(_mm_setr_ph): Likewise.
(_mm256_setr_ph): Likewise.
(_mm512_setr_ph): Likewise.
(_mm_set1_ph): Likewise.
From: "Guo, Xuepeng"
gcc/ChangeLog:
* common/config/i386/cpuinfo.h (get_available_features):
Detect FEATURE_AVX512FP16.
* common/config/i386/i386-common.c
(OPTION_MASK_ISA_AVX512FP16_SET,
OPTION_MASK_ISA_AVX512FP16_UNSET,
OPTION_MASK_ISA2_AVX512FP1
Hi:
This is a follow up of [1].
Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
Pushed to trunk.
[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576514.html
gcc/ChangeLog:
* config/i386/sse.md (cond_): New expander.
(cond_mul): Ditto.
gcc/testsuite/ChangeLo
Hi:
The define_peephole2 which is added by r12-2640-gf7bf03cf69ccb7dc
should only work on general registers, considering that x86 also
supports mov instructions between gpr, sse reg, mask reg, limiting the
peephole2 predicate to general_reg_operand.
I failed to contruct a testcase, but I believ
Hi:
This patch add expanders cond_{fma,fms,fnms,fnms}
for vector float/double modes.
Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
Pushed to trunk.
gcc/ChangeLog:
* config/i386/sse.md (cond_fma): New expander.
(cond_fms): Ditto.
(cond_fnma): Ditto.
Hi:
Pushed to trunk as an abvious fix.
gcc/testsuite/ChangeLog:
* gcc.target/i386/cond_op_addsubmul_d-2.c: Add
dg-require-effective-target for avx512.
* gcc.target/i386/cond_op_addsubmul_q-2.c: Ditto.
* gcc.target/i386/cond_op_addsubmul_w-2.c: Ditto.
* gc
Hi:
Together with the previous 3 patches, all cond_op expanders of vector
modes are supported (if they have a corresponding avx512 mask instruction).
Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
liuhongt (3):
[i386] Support cond_{smax,smin,umax,umin} for vector integer modes
gcc/ChangeLog:
* config/i386/sse.md (cond_): New expander.
gcc/testsuite/ChangeLog:
* gcc.target/i386/cond_op_maxmin_b-1.c: New test.
* gcc.target/i386/cond_op_maxmin_b-2.c: New test.
* gcc.target/i386/cond_op_maxmin_d-1.c: New test.
* gcc.target/i386/cond
gcc/ChangeLog:
* config/i386/sse.md (cond_): New expander.
gcc/testsuite/ChangeLog:
* gcc.target/i386/cond_op_anylogic_d-1.c: New test.
* gcc.target/i386/cond_op_anylogic_d-2.c: New test.
* gcc.target/i386/cond_op_anylogic_q-1.c: New test.
* gcc.target/i38
gcc/ChangeLog:
* config/i386/sse.md (cond_): New expander.
gcc/testsuite/ChangeLog:
* gcc.target/i386/cond_op_maxmin_double-1.c: New test.
* gcc.target/i386/cond_op_maxmin_double-2.c: New test.
* gcc.target/i386/cond_op_maxmin_float-1.c: New test.
* gcc.ta
Hi:
---
OK, I think sth is amiss here upthread. insv/extv do look like they
are designed
to work on integer modes (but docs do not say anything about this here).
In fact the caller of extract_bit_field_using_extv is named
extract_integral_bit_field. Of course nothing seems to check what kind of
m
Hi:
Bootstrapped and regtested on x86_64-linux-gnu{-m32,}
Ok for trunk?
gcc/ChangeLog:
PR rtl-optimization/101796
* simplify-rtx.c
(simplify_context::simplify_binary_operation_1): Simplify
vector shift/rotate with const_vec_duplicate to vector
shift/rot
Hi:
Boostrapped and regtested on x86_64-linux-gnu{-m32,}.
gcc/ChangeLog:
* config/i386/sse.md (cond_): New expander.
(VI248_AVX512VLBW): New mode iterator.
* config/i386/predicates.md
(nonimmediate_or_const_vec_dup_operand): New predicate.
gcc/testsuite/ChangeLo
Hi:
AVX512F supported vscalefs{s,d} which is the same as ldexp except the second
operand should be floating point.
Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
gcc/ChangeLog:
PR target/98309
* config/i386/i386.md (ldexp3): Extend to vscalefs[sd]
when TARGET_
Hi:
Add define_insn_and_split to combine avx_vec_concatv16si/2 and
avx512f_zero_extendv16hiv16si2_1 since the latter already zero_extend
the upper bits, similar for other patterns which are related to
pmovzx{bw,wd,dq}.
It will do optimization like
- vmovdqa %ymm0, %ymm0# 7 [c=4 l=
Hi:
This is the patch i'm going to checkin.
Bootstrapped and regtested on x86_64-linux-gnu{-m32,};
2021-08-12 Uros Bizjak
gcc/ChangeLog:
PR target/98309
* config/i386/i386.md (avx512f_scalef2): New
define_insn.
(ldexp3): Adjust for new define_insn.
Hi:
This is another patch to optimize vec_perm_expr to match vpmov{dw,dq,wb}
under AVX512.
For scenarios(like pr101846-2.c) where the upper half is not used, this patch
generates better code with only one vpmov{wb,dw,qd} instruction. For
scenarios(like pr101846-3.c) where the upper half is actu
Hi:
Here's updated patch which does 3 things:
1. Support vpermw/vpermb in ix86_expand_vec_one_operand_perm_avx512.
2. Support 256/128-bits vpermi2b in ix86_expand_vec_perm_vpermt2.
3. Add define_insn_and_split to optimize specific vector permutation to
opmov{dw,wb,qd}.
Bootstrapped and regtes
Hi:
avx512f_scalef2 only accept register_operand for operands[1],
force it to reg in ldexp3.
Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
Ok for trunk.
gcc/ChangeLog:
PR target/101930
* config/i386/i386.md (ldexp3): Force operands[1] to
reg.
gcc/testsuite
Hi:
This patch add a new x86 tune named X86_TUNE_V2DF_REDUCTION_PREFER_HADDPD
to enable haddpd for v2df vector reduction, the tune is disabled by default.
Bootstrapped and regtested on x86_64-linux-gnu{-m32,}
Ok for trunk?
gcc/ChangeLog:
PR target/97147
* config/i386/i386.h
This reverts commit 872da9a6f664a06d73c987aa0cb2e5b830158a10.
PR target/101936
PR target/101929
Bootstrapped and regtested on x86_64-linux-gnu{-m32,}
Pushed to master.
---
gcc/config/i386/i386.c | 6 +-
gcc/config/i386/i386.h | 1 -
gcc/config/i386/x8
Performance impact for the commit with option:
-march=x86-64 -O2 -ftree-vectorize -fvect-cost-model=very-cheap
SPEC2017 fprate
503.bwaves_rBuildSame
507.cactuBSSN_r -0.04
508.namd_r 0.14
510.parest_r-0.54
511.povray_r 0.10
519.lbm_r B
Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
Pushed to trunk.
gcc/ChangeLog:
PR target/102016
* config/i386/sse.md (*avx512f_pshufb_truncv8hiv8qi_1): Add
TARGET_AVX512BW to condition.
gcc/testsuite/ChangeLog:
PR target/102016
* gcc.target/i3
Also optimize below 3 forms to vpternlog, op1, op2, op3 are
register_operand or unary_p as (not reg)
A: (any_logic (any_logic op1 op2) op3)
B: (any_logic (any_logic op1 op2) (any_logic op3 op4)) op3/op4 should
be equal to op1/op2
C: (any_logic (any_logic (any_logic:op1 op2) op3) op4) op3/op4 shoul
Hi:
This patch extend change_zero_ext to change illegitimate constant
into constant pool, this will enable simplification of below:
Trying 5 -> 7:
5: r85:V4SF=[`*.LC0']
REG_EQUAL const_vector
7: r84:V4SF=vec_select(vec_concat(r85:V4SF,r85:V4SF),parallel)
REG_DEAD r85:V4SF
gcc/ChangeLog:
PR target/101989
* config/i386/sse.md (_vternlog):
Enable avx512 embedded broadcast.
(*_vternlog_all): Ditto.
(_vternlog_mask): Ditto.
gcc/testsuite/ChangeLog:
PR target/101989
* gcc.target/i386/pr101989-broadcast-1.c: New te
Pushed to trunk as an obvious fix.
gcc/testsuite/ChangeLog:
PR target/101989
* gcc.target/i386/avx2-shiftqihi-constant-1.c: Add -mno-avx512f.
* gcc.target/i386/sse2-shiftqihi-constant-1.c: Add -mno-avx
---
gcc/testsuite/gcc.target/i386/avx2-shiftqihi-constant-1.c | 2 +-
This patch is a follow-up to [1], it fold all shufps/shufpd builtins into
gimple.
Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
[1] https://gcc.gnu.org/pipermail/gcc-patches/2019-May/521983.html
gcc/
PR target/98167
PR target/43147
* config/i386/i386.c (ix86_
When gimple simplifcation try to combine op and vec_cond_expr to cond_op,
it doesn't check if mask type matches. It causes an ICE when expand cond_op
with mismatched mode.
This patch add a function named cond_vectorized_internal_fn_supported_p
to additionally check mask type than vectorized_in
Currently for evex vpcmpeqb instruction, we have two forms of rtl
template representation, one is (unspec [op1 op2] UNSPEC_MASK_EQ), the
other is (unspec [op1, op2, const_int 0] UNSPEC_PCMP), which increases
the maintenance burden, such as optimization (not: vpcmpeqb)
to (vpcmpneqb) requires two de
This reverts commit 7218c2ec365ce95f5a1012a6eb425b0a36aec6bf.
PR middle-end/102133
---
gcc/expmed.c | 103 +--
1 file changed, 25 insertions(+), 78 deletions(-)
diff --git a/gcc/expmed.c b/gcc/expmed.c
index f083d6e86d0..3143f38e057 100644
---
o see whether binaries are the same as
HEAD~2, i guess they're the same.
[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-August/578189.html.
liuhongt (2):
Revert "Make sure we're playing with integral modes before call
extract_integral_bit_field."
Get rid of all f
gcc/ChangeLog:
* emit-rtl.c (validate_subreg): Get rid of all float-int
special cases.
---
gcc/emit-rtl.c | 40
1 file changed, 40 deletions(-)
diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index ff3b4449b37..77ea8948ee8 100644
--- a/gcc/em
In vectorizable_nonlinear_induction, r13-2503-gc13223b790bbc5 prevent variable
peeling by
only checking LOOP_VINFO_MASK_SKIP_NITERS (loop_vinfo). But when
"!vect_use_loop_mask_for_alignment_p (loop_vinfo) &&
LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) < 0", vectorizer will
still do variable peel
For ifloor/lfloor/iceil/lceil/irint/lrint/iround/lround when size of
in_mode is not equal out_mode, vectorizer doesn't go to internal fn
way,still left that part in the ix86_builtin_vectorized_function.
Remove others builtins and add corresponding expanders.
Note the patch just refactor the codes,
There's peephole2 submit in 1990s which split cmp mem, 0 to load mem,
reg + test reg, reg. I don't know exact reason why gcc do this.
For latest x86 processors, ciscization should help processor frontend
also codesize, for processor backend, they should be the same(has same
uops).
So the patch de
For Skylake based processor, decoder is 4-way.
For Sunny Cove and Willow Cove, decoder is 5-way.
For Golden cove, decoder is 6-way.
Bootstrapped and regtested on x86-64-pc-linux-gnu{-m32,}.
Ready to install.
gcc/ChangeLog:
* config/i386/x86-tune-sched.cc (ix86_issue_rate): Adjust for
Here's list the patch supported.
rint/nearbyint/ceil/floor/trunc/lrint/lceil/lfloor/round/lround.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
Ok for trunk?
gcc/ChangeLog:
PR target/106910
* config/i386/mmx.md (nearbyintv2sf2): New expander.
(rintv2sf2): Ditt
The codes in vectorizable_induction for slp_node assume all phi_info
have same induction type(vect_step_op_add), but since we support
nonlinear induction, it could be wrong handled.
So the patch return false when slp_node has mixed induction type.
Note codes in other place will still vectorize the
When init_expr is INTEGER_CST or REAL_CST, can_vec_perm_const_p is not
necessary since there's no real vec_perm needed, but
vec_gen_perm_mask_checked will gcc_assert (can_vec_perm_const_p). So
it's better to use vec_gen_perm_mask_any in
vect_create_nonlinear_iv_init.
Bootstrapped and regtested on
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Verify 526.blend_r can be rebuilt with the fix.
Ok for trunk?
gcc/ChangeLog:
PR target/106994
* config/i386/mmx.md (floorv2sf2): Fix typo, use
register_operand instead of vector_operand for operands[1].
gcc/testsu
x86 have shufps which shuffles the first operand to the lower 64-bit,
and the second operand to the upper 64-bit. For
__builtin_shufflevector (op0, op1, 1, 4, 3, 6), it will be veclowered since
can_vec_perm_const_p return false for sse2 target.
This patch add a new function to support 2-operand v4s
>Missing space before (
Changed.
>> + /* shufps. */
>> + ok = expand_vselect_vconcat(tmp, d->op0, d->op1,
>> + perm1, d->nelt, false);
>
>Ditto.
Changed.
>
>> + /* When lone_idx is not 0, it must from second op(count == 1). */
>> + gcc_assert ((lo
201 - 300 of 591 matches
Mail list logo