Ping
rebased on latest trunk.
gcc/ChangeLog:
* common.opt (ftree-vectorize): Add Var(flag_tree_vectorize).
* doc/invoke.texi (Options That Control Optimization): Update
documents.
* opts.c (default_options_table): Enable auto-vectorization at
O2 with very-c
Ping.
Bootstrapped and regtest on x86_64-linux-gnu{-m32,},
aarch64-unknown-linux-gnu{-m32,}
Ok for trunk?
gcc/ChangeLog:
PR middle-end/102080
* match.pd: Check mask type when doing cond_op related gimple
simplification.
* tree.c (is_truth_type_for): New funct
Bootstrapped and regtest on x86_64-pc-linux-gnu{-m32,}.
Runtime tests passed under sde{-m32,}.
gcc/ChangeLog:
PR target/87767
* config/i386/i386.c (ix86_print_operand): Handle
V8HF/V16HF/V32HFmode.
* config/i386/i386.h (VALID_BCST_MODE_P): Add HFmode.
*
Besides conversion instructions, pass_rpad also handles scalar
sqrt/rsqrt/rcp/round instructions, while r12-3614 should only want to
handle conversion instructions, so fix it.
Bootstrapped and regtest on x86_64-linux-gnu{-m32,} w/ configure
--enable-checking=yes,rtl,extra, failed tests are fixed
Hi:
fma/fms/fnma/fnmsv2sf4 are defined only under (TARGET_FMA || TARGET_FMA4).
The patch extend the expanders to TARGET_AVX512VL.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?
gcc/ChangeLog:
* config/i386/mmx.md (fmav2sf4): Extend to AVX512 fma.
(f
Pushed to trunk.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr92658-avx512f.c: Refine testcase.
* gcc.target/i386/pr92658-avx512vl.c: Adjust scan-assembler,
only v2di->v2qi truncate is not supported, v4di->v4qi should
be supported.
---
gcc/testsuite/gcc.target/i38
---
htdocs/gcc-12/changes.html | 8 ++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
index 81f62fe3..14149212 100644
--- a/htdocs/gcc-12/changes.html
+++ b/htdocs/gcc-12/changes.html
@@ -165,8 +165,12 @@ a work-in-progre
expander for smin/maxhf3.
AVX512FP16: Add fix(uns)?_truncmn2 for HF scalar and vector modes
AVX512FP16: Add float(uns)?mn2 expander
AVX512FP16: add truncmn2/extendmn2 expanders
AVX512FP16: Enable vec_cmpmn/vcondmn expanders for HF modes.
liuhongt (2):
AVX512FP16: Add expander for rint
gcc/ChangeLog:
* config/i386/i386.md (rinthf2): New expander.
(nearbyinthf2): New expander.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512fp16-builtin-round-1.c: Add new testcase.
---
gcc/config/i386/i386.md | 22 +++
.../i386/avx
gcc/ChangeLog:
* config/i386/sse.md (FMAMODEM): extend to handle FP16.
(VFH_SF_AVX512VL): Extend to handle HFmode.
(VF_SF_AVX512VL): Deleted.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512fp16-fma-1.c: New test.
* gcc.target/i386/avx512fp16vl-fma-1.c: N
From: Hongyu Wang
gcc/ChangeLog:
* config/i386/i386.md (hf3): New expander.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512fp16-builtin-minmax-1.c: New test.
---
gcc/config/i386/i386.md | 11 ++
.../i386/avx512fp16-builtin-minmax-1.c| 35 +++
From: Hongyu Wang
NB: 64bit/32bit vectorize for HFmode is not supported for now, will
adjust this patch when V2HF/V4HF operations supported.
gcc/ChangeLog:
* config/i386/i386.md (fix_trunchf2): New expander.
(fixuns_trunchfhi2): Likewise.
(*fixuns_trunchfsi2zext): New de
From: Hongyu Wang
gcc/ChangeLog:
* config/i386/sse.md (float2):
New expander.
(avx512fp16_vcvt2ph_):
Rename to ...
(floatv4hf2): ... this, and drop constraints.
(avx512fp16_vcvtqq2ph_v2di): Rename to ...
(floatv2div2hf2): ... this, and like
From: Hongyu Wang
gcc/ChangeLog:
* config/i386/sse.md (extend2):
New expander.
(extendv4hf2): Likewise.
(extendv2hfv2df2): Likewise.
(trunc2): Likewise.
(avx512fp16_vcvt2ph_): Rename to ...
(truncv4hf2): ... this, and drop constraints.
From: Hongyu Wang
gcc/ChangeLog:
* config/i386/i386-expand.c (ix86_use_mask_cmp_p): Enable
HFmode mask_cmp.
* config/i386/sse.md (sseintvecmodelower): Add HF vector modes.
(_store_mask): Extend to support HF vector modes.
(vec_cmp): Likewise.
(vcon
Updated, mention _Float16 support.
---
htdocs/gcc-12/changes.html | 13 -
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
index 81f62fe3..f19c6718 100644
--- a/htdocs/gcc-12/changes.html
+++ b/htdocs/gcc-12/changes.
Hi:
Related discussion in [1] and PR.
Bootstrapped and regtest on x86_64-linux-gnu{-m32,}.
Ok for trunk?
[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574330.html
gcc/ChangeLog:
PR target/102464
* config/i386/i386.c (ix86_optab_supported_p):
Return true f
[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580207.html
gcc/ChangeLog:
* doc/extend.texi (Half-Precision): Remove storage only
description for _Float16 w/o avx512fp16.
---
gcc/doc/extend.texi | 11 +--
1 file changed, 5 insertions(+), 6 deletions(-)
diff
Hi:
> Please don't add the -fno- option to the warning tests. As I said,
> I would prefer to either suppress the vectorization for the failing
> cases by tweaking the test code or xfail them. That way future
> regressions won't be masked by the option. Once we've moved
> the warning to a more su
Revert due to performace regression.
This reverts commit 8f323c712ea76cc4506b03895e9b991e4e4b2baf.
PR target/102473
PR target/101059
---
gcc/config/i386/sse.md| 39 ++-
gcc/testsuite/gcc.target/i386/sse2-pr101059.c | 32 ---
gcc/tests
Hi:
Add expanders for reduc_{smin,smax,plus}_scal_{v8hf,v16hf,v32hf}
Bootstrapped and regtest on x86_64-pc-linux-gnu{-m32,}
gcc/ChangeLog:
* config/i386/i386-expand.c (emit_reduc_half): Handle
V8HF/V16HF/V32HFmode.
* config/i386/sse.md (REDUC_SSE_PLUS_MODE): Add V8HF
Hi:
Bootstrapped and regtested on x86_64-pc-lunux-gnu{-m32,}.
Ok for trunk?
gcc/ChangeLog:
PR target/102494
* config/i386/i386-expand.c (emit_reduc_half): Hanlde V4HImode.
* config/i386/mmx.md (reduc_plus_scal_v4hi): New.
(reduc__scal_v4hi): New.
gcc/testsuite
> (I'm assuming the difference is due to some architectural
> constraints as opposed to arbitrary limitations in the code
There're 2 difference:
1. target support unaligned store or not.
2. target support move by piece or not(which will enable block move in gimple
level).
Updated patch.
Adjust c
Sorry for the slow reply:
Here is update according to comments
1. Define new match function in match.pd.
2. Adjust code for below
>> + gsi_remove (gsip, true);
>> + var = build1 (NOP_EXPR, TREE_TYPE (use_nop_lhs), var);
>
>instead of building a GENERIC NOP you co
a and b are same type as the truncation type and has less precision
than extend type.
Bootstrapped and regtested on x86-64-pc-linux-gnu{-m32,}.
Ok for trunk?
gcc/ChangeLog:
PR target/102464
* match.pd: simplify (trunc)copysign((extend)a, (extend)b) to
.COPYSIGN (a,b) when
Bootstrapped and regtested on x86-64-pc-linux-gnu{-m32,}.
Ready to push to trunk after first patch is approved.
gcc/ChangeLog:
PR target/101989
* config/i386/predicates.md (reg_or_notreg_operand): Rename to ..
(regmem_or_bitnot_regmem_operand): .. and extend to handle
> Note that this is not safe with -fsignaling-nans, so needs to be disabled
> for that option (if there isn't already logic somewhere with that effect),
> because the extend will convert a signaling NaN to quiet (raising
> "invalid"), but copysign won't, so this transformation could result in a
> s
a and b are same type as trunc type and has less precision than
extend type, the transformation is guarded by flag_finite_math_only.
Bootstrapped and regtested under x86_64-pc-linux-gnu{-m32,}
Ok for trunk?
gcc/ChangeLog:
PR target/102464
* match.pd: Simplify (trunc)fmax/fmin((ex
a, b, c are same type as truncation type and has less precision than
extend type, the optimization is guarded under
flag_unsafe_math_optimizations.
Bootstrapped and regtested under x86_64-pc-linux-gnu{-m32,}
Ok for trunk?
gcc/ChangeLog:
PR target/102464
* match.pd: Simplify
Bootstrappend on x86_64-pc-linux-gnu{-m32,}
Ok for trunk?
gcc/ChangeLog:
PR tree-optimization/103077
* doc/invoke.texi (Options That Control Optimization):
Update documentation for -ftree-loop-vectorize and
-ftree-slp-vectorize which are enabled by default at -02.
This will enable transformation like
- # sum1_50 = PHI
- # sum2_52 = PHI
+ # sum1_50 = PHI <_87(13), 0(4)>
+ # sum2_52 = PHI <_89(13), 0(4)>
# ivtmp_62 = PHI
i.2_7 = (long unsigned int) i_49;
_8 = i.2_7 * 8;
...
vec1_i_38 = vec1_29 >> _10;
vec2_i_39 = vec2_31 >> _10;
_11 =
> >
> > +#if GIMPLE
> > +(match (nop_atomic_bit_test_and_p @0 @1)
> > + (bit_and:c (nop_convert?@4 (ATOMIC_FETCH_OR_XOR_N @2 INTEGER_CST@0 @3))
> > + INTEGER_CST@1)
>
> no need for the :c on the bit_and when the 2nd operand is an
Changed.
> INTEGER_CST (likewise below)
>
> > + (with {
>
This patch fixes ICE in pr103151.
Bootstrap and regtest on x86_64-linux-gnu{-m32,}.
Ready to push to trunk.
gcc/ChangeLog:
PR target/103151
* config/i386/sse.md (V_128_256): Extend to V8HF/V16HF.
(avxsizesuffix): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386
r12-5102-gfb161782545224f5 improves integer bit test on
__atomic_fetch_[or|and]_* returns only for nop_convert, .i.e.
transfrom
mask_5 = 1 << bit_4(D);
mask.0_1 = (unsigned int) mask_5;
_2 = __atomic_fetch_or_4 (a_7(D), mask.0_1, 0);
t1_9 = (int) _2;
t2_10 = mask_5 & t1_9;
to
mask_5
This is the updated patch.
For i386, it will enable below opt
from
notl%edi
vpbroadcastd%edi, %xmm0
vpand %xmm1, %xmm0, %xmm0
to
vpbroadcastd%edi, %xmm0
vpandn %xmm1, %xmm0, %xmm0
gcc/ChangeLog:
PR target/100711
* simplify-rtx.c (simplify_unary_operat
For i386, it will enable below opt
from
notl%edi
vpbroadcastd%edi, %xmm0
vpand %xmm1, %xmm0, %xmm0
to
vpbroadcastd%edi, %xmm0
vpandn %xmm1, %xmm0, %xmm0
gcc/ChangeLog:
PR target/100711
* simplify-rtx.c (simplify_unary_operat
Use "used" flag for CALL_INSN to indicate it's a fake call. If it's a
fake call, it won't have its own function stack.
gcc/ChangeLog
PR target/82735
* df-scan.c (df_get_call_refs): When call_insn is a fake call,
it won't use stack pointer reg.
* final.c (leaf_funct
When __builtin_ia32_vzeroupper is called explicitly, the corresponding
vzeroupper pattern does not carry any CLOBBERS or SETs before LRA,
which leads to incorrect optimization in pass_reload. In order to
solve this problem, this patch refine instructions as call_insns in
which the call has a specia
For evex encoding extended instructions, when vector length is less
than 512 bits, AVX512VL is needed, besides some instructions like
vpmovzxbx need extra AVX512BW. So this patch refines corresponding
constraints, i.e. from "v/vm" to "Yv/Yvm", from "v/vm" to "Yw/Ywm".
Bootstrapped and regtested on
---
htdocs/gcc-12/changes.html | 9 +
1 file changed, 9 insertions(+)
diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
index 22839f2d..6e898db7 100644
--- a/htdocs/gcc-12/changes.html
+++ b/htdocs/gcc-12/changes.html
@@ -68,6 +68,15 @@ a work-in-progress.
General Im
For AVX512-FP16, HFmode only supports vcmpsh whose dest is mask
register, so for movhfcc, it's
vcmpsh op2, op1, %k1
vmovsh op1, op2{%k1}
mov op2, dest
gcc/ChangeLog:
PR target/102639
* config/i386/i386-expand.c (ix86_valid_mask_cmp_mode): Handle
HFmode.
(ix86_use_
Pushed to trunk.
libgomp/ChangeLog:
* testsuite/libgomp.c++/scan-10.C: Add option -fvect-cost-model=cheap.
* testsuite/libgomp.c++/scan-11.C: Ditto.
* testsuite/libgomp.c++/scan-12.C: Ditto.
* testsuite/libgomp.c++/scan-13.C: Ditto.
* testsuite/libgomp.c++/
libgomp/ChangeLog:
* testsuite/libgomp.graphite/force-parallel-8.c: Add
-fno-tree-vectorize.
---
libgomp/testsuite/libgomp.graphite/force-parallel-8.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/libgomp/testsuite/libgomp.graphite/force-parallel-8.c
b/libgomp/test
gcc/testsuite/ChangeLog:
PR middle-end/102669
* gnat.dg/unroll1.adb: Add -fno-tree-vectorize.
---
gcc/testsuite/gnat.dg/unroll1.adb | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/testsuite/gnat.dg/unroll1.adb
b/gcc/testsuite/gnat.dg/unroll1.adb
index 34d8
After providing expanders for reduc_umin/umax/smin/smax_scal_v4qi,
perfomance are a little bit faster than before for reduce operations
w/ options -O2 -march=haswell, -O2 -march=skylake-avx512
and -Ofast -march=skylake-avx512.
gcc/ChangeLog
PR target/102483
* config/i386/i386-ex
As discussed in PR.
It looks like it's just the the location of the warning that's off,
the warning itself is still issued but it's swallowed by the
dg-prune-output directive.
Since the test was added to verify the fix for an ICE without
vectorization I think disabling vectorization should be fine
Hi Kewen:
Cound you help to verify if this patch fix those regressions
for rs6000 port.
As discussed in [1], this patch add xfail/target selector to those
testcases, also make a copy of them so that they can be tested w/o
vectorization.
Newly added xfail/target selectors are used to check the v
updated patch:
1. Add documents in doc/sourcebuild.texi (Effective-Target Keywords).
2. Reduce -novec.c testcases to contain only new failed parted which
is caused by O2 vectorization.
3. Add PR in dg-warning comment.
As discussed in [1], this patch add xfail/target selector to those
testcas
Hi:
This patch is try to canoicalize bit_and and nop_convert order for
__atomic_fetch_or_*, __atomic_fetch_xor_*,
__atomic_xor_fetch_*,__sync_fetch_and_or_*,
__sync_fetch_and_xor_*,__sync_xor_and_fetch_*,
__atomic_fetch_and_*,__sync_fetch_and_and_* when mask is constant.
.i.e.
+/* Canonicalize
+
Similar for sqrt/sqrtl.
gcc/ChangeLog:
PR target/102464
* match.pd: Simplify (_Float16) sqrtf((float) a) to .SQRT(a)
when direct_internal_fn_supported_p, similar for sqrt/sqrtl.
gcc/testsuite/ChangeLog:
PR target/102464
* gcc.target/i386/pr102464-sqrtph.c
Canoicalize & and nop_convert order for
__atomic_fetch_or_*, __atomic_fetch_xor_*,
__atomic_xor_fetch_*,__sync_fetch_and_or_*,
__sync_fetch_and_xor_*,__sync_xor_and_fetch_*,
__atomic_fetch_and_*,__sync_fetch_and_and_* when mask is constant.
.i.e.
+/* Canonicalize
+ _1 = __atomic_fetch_or_4 (&v,
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?
gcc/ChangeLog:
PR target/102464
* config/i386/i386-builtin-types.def (V8HF_FTYPE_V8HF): New
function type.
(V16HF_FTYPE_V16HF): Ditto.
(V32HF_FTYPE_V32HF): Ditto.
(V8HF_FTYP
Adjust code in check_vect_slp_aligned_store_usage to make it an exact
pattern match of the corresponding testcases.
These new target/xfail selectors are added as a temporary solution,
and should be removed after real issue is fixed for Wstringop-overflow.
gcc/ChangeLog:
* doc/sourcebuild.
By optimizing vector movement to broadcast in ix86_expand_vector_move
during pass_expand, pass_reload/LRA can automatically generate an avx512
embedded broadcast, pass_cpb is not needed.
Considering that in the absence of avx512f, broadcast from memory is
still slightly faster than loading the ent
Hi:
As mention in https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575420.html
cut start-
> note for the lowpart we can just view-convert away the excess bits,
> fully re-using the mask. We generate surprisingly "good" code:
>
> kmovb %k1, %edi
> shrb$4, %dil
>
Hi:
As discussed in [1], this patch support _Float16 under target sse2
and above, w/o avx512fp16, _Float16 type is storage only, all operations
are emulated by soft-fp and float instructions. Soft-fp keeps the intermediate
result of the operation at 32-bit precision by defaults, which may lead to
libgcc/ChangeLog
* soft-fp/eqhf2.c: New file.
* soft-fp/extendhfdf2.c: New file.
* soft-fp/extendhfsf2.c: New file.
* soft-fp/extendhfxf2.c: New file.
* soft-fp/half.h (FP_CMP_EQ_H): New marco.
* soft-fp/truncdfhf2.c: New file
* soft-fp/trunc
gcc/ChangeLog:
* config/i386/i386-modes.def (FLOAT_MODE): Define ieee HFmode.
* config/i386/i386.c (enum x86_64_reg_class): Add
X86_64_SSEHF_CLASS.
(merge_classes): Handle X86_64_SSEHF_CLASS.
(examine_argument): Ditto.
(construct_container): Ditto.
gcc/ChangeLog:
* optabs-query.c (get_best_extraction_insn): Use word_mode for
HF field.
libgcc/ChangeLog:
* config/i386/32/sfp-machine.h (_FP_NANFRAC_H): New macro.
* config/i386/64/sfp-machine.h (_FP_NANFRAC_H): Ditto.
* config/i386/sfp-machine.h (_FP_NAN
From: "Guo, Xuepeng"
gcc/ChangeLog:
* common/config/i386/cpuinfo.h (get_available_features):
Detect FEATURE_AVX512FP16.
* common/config/i386/i386-common.c
(OPTION_MASK_ISA_AVX512FP16_SET,
OPTION_MASK_ISA_AVX512FP16_UNSET,
OPTION_MASK_ISA2_AVX512FP1
gcc/ChangeLog:
* config/i386/avx512fp16intrin.h (_mm_set_ph): New intrinsic.
(_mm256_set_ph): Likewise.
(_mm512_set_ph): Likewise.
(_mm_setr_ph): Likewise.
(_mm256_setr_ph): Likewise.
(_mm512_setr_ph): Likewise.
(_mm_set1_ph): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/i386/m512-check.h: Add union128h, union256h, union512h.
* gcc.target/i386/avx512fp16-10a.c: New test.
* gcc.target/i386/avx512fp16-10b.c: Ditto.
* gcc.target/i386/avx512fp16-1a.c: Ditto.
* gcc.target/i386/avx512fp16-1b.c
From: "H.J. Lu"
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512fp16-vararg-1.c: New test.
* gcc.target/i386/avx512fp16-vararg-2.c: Ditto.
* gcc.target/i386/avx512fp16-vararg-3.c: Ditto.
* gcc.target/i386/avx512fp16-vararg-4.c: Ditto.
---
.../gcc.target/i386/avx
gcc/testsuite/ChangeLog:
* gcc.target/x86_64/abi/avx512fp16/m256h/abi-avx512fp16-ymm.exp:
New exp file.
* gcc.target/x86_64/abi/avx512fp16/m256h/args.h: New header.
* gcc.target/x86_64/abi/avx512fp16/m256h/avx512fp16-ymm-check.h:
Likewise.
* gcc.targ
gcc/testsuite/ChangeLog:
* gcc.target/x86_64/abi/avx512fp16/m512h/abi-avx512fp16-zmm.exp:
New file.
* gcc.target/x86_64/abi/avx512fp16/m512h/args.h: Likewise.
* gcc.target/x86_64/abi/avx512fp16/m512h/asm-support.S: Likewise.
* gcc.target/x86_64/abi/avx512fp1
Hi:
As described in PR 39821, WIDEN_MULT_EXPR should use a different cost
model from MULT_EXPR, this patch add ix86_widen_mult_cost for that.
Reference basis for the cost model is https://godbolt.org/z/EMjaz4Knn.
Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
gcc/ChangeLog:
*
Committed as obvious fix, and opened pr101668 to record the issue related
to pr92658-{avx512bw-2,sse4-2,sse4}.c.
gcc/testsuite/ChangeLog:
PR target/99881
* gcc.target/i386/pr91446.c: Adjust testcase.
* gcc.target/i386/pr92658-avx512bw-2.c: Ditto.
* gcc.target/i38
Don't add crtfastmath.o for -shared to avoid changing the MXCSR
register when loading a shared library. crtfastmath.o will be used
only when building executables.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?
gcc/ChangeLog:
PR target/55522
PR target/368
Update in V2:
Split -shared change into a separate commit and add some documentation
for it.
Bootstrapped and regtested on x86_64-pc-linu-gnu{-m32,}.
Ok of trunk?
Don't add crtfastmath.o for -shared to avoid changing the MXCSR register
when loading a shared library. crtfastmath.o will be used onl
Update in v2:
1. Support -mno-daz-ftz, and make the the option effectively three state as:
if (mdaz-ftz)
link crtfastmath.o
else if ((Ofast || ffast-math || funsafe-math-optimizations)
&& !shared && !mno-daz-ftz)
link crtfastmath.o
else
Don't link crtfastmath.o
2. Still make the op
Patches [1] and [2] fixed PR55522 for x86-linux but left all other x86
targets unfixed (x86-cygwin, x86-darwin and x86-mingw32).
This patch applies a similar change to other specs using crtfastmath.o.
Ok for trunk?
[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608528.html
[2] https:
Both "graniterapid-d" and "graniterapids" are attached with
PROCESSOR_GRANITERAPID in processor_alias_table but mapped to
different __cpu_subtype in get_intel_cpu.
And get_builtin_code_for_version will try to match the first
PROCESSOR_GRANITERAPIDS in processor_alias_table which maps to
"granitepr
Merge V_128H and V_256H into V_128 and V_256, adjust related patterns.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready push to trunk.
gcc/ChangeLog:
* config/i386/sse.md (vec_set): Removed.
(V_128H): Merge into ..
(V_128): .. this.
(V_256H): Merge
vpmaskmov{d,q} is available for TARGET_AVX2, vmaskmov{ps,ps} is
available for TARGET_AVX, w/o TARGET_AVX2, we can use vmaskmov{ps,pd}
for VI48_128_256
Bootstrapped and regtested on x86_64-pc-linux{-m32,}.
Ready push to trunk.
gcc/ChangeLog:
PR target/19
* config/i386/sse.md (
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready push to trunk.
gcc/ChangeLog:
* config/i386/sse.md (_blendm): Merge
VF_AVX512HFBFVL into VI12HFBF_AVX512VL.
(VF_AVX512HFBF16): Renamed to VHFBF.
(VF_AVX512FP16VL): Renamed to VHF_AVX512VL.
(VF_
r14-332-g24905a4bd1375c adjusts costing of emulated vectorized
gather/scatter.
commit 24905a4bd1375ccd99c02510b9f9529015a48315
Author: Richard Biener
Date: Wed Jan 18 11:04:49 2023 +0100
Adjust costing of emulated vectorized gather/scatter
Emulated gather/scatter behave similar to
On SPR, vmovsh can be execute on 3 ports, vpblendw can only be
executed on 2 ports.
On znver4, vpblendw can be executed on 4 ports, if vmovsh is similar
as vmovss, then it can also be executed on 4 ports.
So there's no difference for znver? but vmovsh is more optimized on
SPR.
Bootstrapped and reg
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready push to trunk.
gcc/ChangeLog:
* config/i386/sse.md
(_vpermt2var3): New define_insn.
(VHFBF_AVX512VL): New mode iterator.
(VI2HFBF_AVX512VL): New mode iterator.
---
gcc/config/i386/sse.md | 32
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} on SPR.
Ready push to trunk and backport to GCC13/GCC12.
gcc/ChangeLog:
PR target/111306
* config/i386/sse.md (int_comm): New int_attr.
(fma__):
Remove % for Complex conjugate operations since they're not
Here's the patch I've commited.
The patch also remove % for vfmaddcph.
gcc/ChangeLog:
PR target/111306
PR target/111335
* config/i386/sse.md (int_comm): New int_attr.
(fma__):
Remove % for Complex conjugate operations since they're not
commutative.
Memory attribute auto detection will check operand 2 for type sselog,
and check operand 1 for type sselog1. For below 2 insns, there's no
operand 2. Change type to sselog1.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?
gcc/ChangeLog:
PR target/107540
* c
libsanitizer
* configure.tgt: Enable hwasan for x86-64.
---
libsanitizer/configure.tgt | 1 +
1 file changed, 1 insertion(+)
diff --git a/libsanitizer/configure.tgt b/libsanitizer/configure.tgt
index 87d8a2c3820..72385a4a39d 100644
--- a/libsanitizer/configure.tgt
+++ b/libsanitizer/confi
gcc/ChangeLog:
* config/i386/i386-opts.h (enum lam_type): New enum.
* config/i386/i386.c (ix86_memtag_can_tag_addresses): New.
(ix86_memtag_set_tag): Ditto.
(ix86_memtag_extract_tag): Ditto.
(ix86_memtag_add_tag): Ditto.
(ix86_memtag_tag_size): Ditto
}.
Ok for trunk?
liuhongt (2):
Implement hwasan target_hook.
Enable hwasan for x86-64.
gcc/config/i386/i386-expand.cc | 12
gcc/config/i386/i386-options.cc | 3 +
gcc/config/i386/i386-opts.h | 6 ++
gcc/config/i386/i386-protos.h | 2 +
gcc/config/i386/i386.c
This should fix incorrect error when call those builtin with
-mavxneconvert and w/o -mavx512bf16 -mavx512vl.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
Ready to push to trunk.
gcc/ChangeLog:
* config/i386/i386-builtins.cc (def_builtin): Hanlde "shared"
avx512bf16vl-
-Jakub's comments--
That said, these fundamental types whose presence/absence depends on ISA flags
are quite problematic IMHO, as they are incompatible with the target
attribute/pragmas. Whether they are available or not available depends on
whether in this case SSE2 is enabled during c
1547 /* If this insn loads a parameter from its stack slot, then it
1548 represents a savings, rather than a cost, if the parameter is
1549 stored in memory. Record this fact.
1550
1551 Similarly if we're loading other constants from memory (constant
1552 pool, TOC references, sma
After optimization for RA, memory op is not propagated into
instructions(>1), and it make testcases not generate vxorps since
the memory is loaded into the dest, and the dest is never unused now.
So rewrite testcases to make the codegen more stable.
gcc/testsuite/ChangeLog:
* gcc.target/
Use swap_communattive_operands_p for canonicalization. When both value
has same operand precedence value, then first bit in the mask should
select first operand.
The canonicalization should help backends for pattern match. .i.e. x86
backend has lots of vec_merge patterns, combine will create any f
> But for the C++23 macros, more importantly I think we really should
> also in ix86_target_macros_internal add
> if (c_dialect_cxx ()
> && cxx_dialect > cxx20
> && (isa_flag & OPTION_MASK_ISA_SSE2))
> {
> def_or_undef (parse_in, "__STDCPP_FLOAT16_T__");
> def_or_undef
> > + if (!TARGET_SSE2)
> > +{
> > + if (c_dialect_cxx ()
> > + && cxx_dialect > cxx20)
>
> Formatting, both conditions are short, so just put them on one line.
Changed.
> But for the C++23 macros, more importantly I think we really should
> also in ix86_target_macros_internal add
Ready push to trunk.
gcc/testsuite/ChangeLog:
PR tree-optimization/109011
* gcc.target/i386/pr109011-b1.c: New test.
* gcc.target/i386/pr109011-b2.c: New test.
* gcc.target/i386/pr109011-d1.c: New test.
* gcc.target/i386/pr109011-d2.c: New test.
* g
Similar like WIDEN FLOAT_EXPR, when direct_optab is not existed, try
intermediate integer type whenever gimple ranger can tell it's safe.
.i.e.
When there's no direct optab for vector long long -> vector float, but
the value range of integer can be represented as int, try vector int
-> vector floa
Here's update patch with documents in md.texi.
Ok for trunk?
--
Use swap_communattive_operands_p for canonicalization. When both value
has same operand precedence value, then first bit in the mask should
select first operand.
The canonicalization should help backends for pattern match
r14-172-g0368d169492017 use NO_REGS instead of GENERAL_REGS in memory cost
calculation when preferred register class is unkown.
+ /* Costs for NO_REGS are used in cost calculation on the
+1st pass when the preferred register classes are not
+known yet. In this case we take the
> > @@ -4799,7 +4800,8 @@ vect_create_vectorized_demotion_stmts (vec_info
> > *vinfo, vec *vec_oprnds,
> >stmt_vec_info stmt_info,
> >vec &vec_dsts,
> >gimple_stmt_iterator *gsi,
The patch doesn't handle:
1. cast64_to_32,
2. memory source with rsize < range.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?
gcc/ChangeLog:
PR middle-end/108938
* gimple-ssa-store-merging.cc (is_bswap_or_nop_p): New
function, cut from origin
The official name is AVX512-FP16.
Ready to push to trunk.
gcc/ChangeLog:
* config/i386/i386.opt: Change AVX512FP16 to AVX512-FP16.
* doc/invoke.texi: Ditto.
---
gcc/config/i386/i386.opt | 2 +-
gcc/doc/invoke.texi | 6 +++---
2 files changed, 4 insertions(+), 4 deletions(-)
Ready to push to trunk.
---
htdocs/gcc-12/changes.html | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
index 30fa4d6e..49055ffe 100644
--- a/htdocs/gcc-12/changes.html
+++ b/htdocs/gcc-12/changes.html
@@ -754,7 +754,7 @@
401 - 500 of 611 matches
Mail list logo