https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112532
--- Comment #3 from Hongtao.liu ---
mine.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112104
--- Comment #5 from Hongtao.liu ---
(In reply to Andrew Pinski from comment #4)
> Fixed via r14-5428-gfd1596f9962569afff6c9298a7c79686c6950bef .
Note, my patch only handles constant tripcount for XOR, but not do the
transformation when tripcoun
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374
--- Comment #12 from Hongtao.liu ---
> So the testsuite without bootstrap is really unchanged? We still have a
Yes, no extra regression observed from gcc testsuite(both w/ and w/o
--with-arch=skylake-avx512 --with-cpu=skylake-avx512 in config
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374
--- Comment #10 from Hongtao.liu ---
Below patch can pass bootstrap --with-arch=skylake-avx512
--with-cpu=skylake-avx512, but didn't observe obvious typo/bug in the pattern.
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 9ee
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106402
--- Comment #3 from Hongtao.liu ---
(In reply to Thomas Koenig from comment #2)
> It would make sense to have it, I guess. If somebody has access
> to the relevant hardware, it could also be tested :-)
x86 support _Float16 operations with floa
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110966
--- Comment #6 from Hongtao.liu ---
(In reply to Thomas Koenig from comment #5)
> (In reply to Hongtao.liu from comment #4)
> > (In reply to anlauf from comment #3)
> > > (In reply to Hongtao.liu from comment #2)
> > > > (In reply to Richard Bie
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112496
--- Comment #3 from Hongtao.liu ---
(In reply to Richard Biener from comment #2)
> if (TREE_CODE (init_expr) == INTEGER_CST)
> init_expr = fold_convert (TREE_TYPE (vectype), init_expr);
> else
> gcc_assert (tree_nop_conversion_p (TREE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374
--- Comment #9 from Hongtao.liu ---
When I remove all cond_ patterns, it passed bootstrap. continue to
rootcause the exact pattern which cause the bootstrapped failure
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112443
--- Comment #7 from Hongtao.liu ---
Should be Fixed in GCC14/GCC13/GCC12
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112443
--- Comment #1 from Hongtao.liu ---
The below can fix that, there's typo for 2 splitters.
@@ -17082,7 +17082,7 @@ (define_insn_and_split "*avx2_pcmp3_4"
(match_dup 4))]
UNSPEC_BLENDV))]
{
- if (INTVAL (operands[5])
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112441
Hongtao.liu changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112441
Bug ID: 112441
Summary: Comparing stages 2 and 3 Bootstrap comparison failure!
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Comp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112393
--- Comment #5 from Hongtao.liu ---
Fixed.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112393
--- Comment #3 from Hongtao.liu ---
Yes, should return true if d->testing_p instead of generate rtl code.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108707
Hongtao.liu changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102383
--- Comment #5 from Hongtao.liu ---
It's fixed in GCC12.1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105034
Hongtao.liu changed:
What|Removed |Added
Resolution|--- |FIXED
Status|NEW
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 101956, which changed state.
Bug 101956 Summary: Miss vectorization from v4hi to v4df
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101956
What|Removed |Added
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101956
Hongtao.liu changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110015
--- Comment #4 from Hongtao.liu ---
> So here we have a reduction for MAX_EXPR, but there's 2 MAX_EXPR which can
> be merge together with MAX_EXPR >
>
Create pr112324.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112324
Bug ID: 112324
Summary: phiopt fail to recog if (b < 0) max = MAX(-b, max);
else max = MAX (b, max) into max = MAX (ABS(b), max)
Product: gcc
Version: 14.0
Status: UNCON
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110015
--- Comment #3 from Hongtao.liu ---
169test.c:85:23: note: vect_is_simple_use: operand max_38 = PHI , type of def: unknown
170test.c:85:23: missed: Unsupported pattern.
171test.c:62:24: missed: not vectorized: unsupported use in stmt.
172t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112276
--- Comment #8 from Hongtao.liu ---
Fixed.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112104
--- Comment #3 from Hongtao.liu ---
We already have analyze_and_compute_bitop_with_inv_effect, but it only works
when inv is an SSA_NAME, it should be extended to constant.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112276
--- Comment #4 from Hongtao.liu ---
-(define_split
- [(set (match_operand:V2HI 0 "register_operand")
-(eq:V2HI
- (eq:V2HI
-(us_minus:V2HI
- (match_operand:V2HI 1 "register_operand")
- (matc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112276
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111972
--- Comment #7 from Hongtao.liu ---
(In reply to Andrew Pinski from comment #3)
> First off does this even make sense to vectorize but rather do some kind of
> scalar reduction with respect to j = j^1 here . Filed PR 112104 for that.
>
> Basi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111972
--- Comment #6 from Hongtao.liu ---
(In reply to Andrew Pinski from comment #5)
> Oh this is the original code:
> https://github.com/kdlucas/byte-unixbench/blob/master/UnixBench/src/whets.c
>
Yes, it's from unixbench.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111833
--- Comment #5 from Hongtao.liu ---
It's the same issue as PR111820, thus should be fixed.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111820
--- Comment #15 from Hongtao.liu ---
(In reply to Richard Biener from comment #13)
> (In reply to Hongtao.liu from comment #12)
> > Fixed in GCC14, not sure if we want to backport the patch.
> > If so, the patch needs to be adjusted since GCC13
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111972
Hongtao.liu changed:
What|Removed |Added
CC||pinskia at gcc dot gnu.org
Compo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111972
Bug ID: 111972
Summary: [14 regression] missed vectorzation for bool a = j !=
1; j = (long int)a;
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: norm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111874
--- Comment #3 from Hongtao.liu ---
> For the case of conditional (or loop masked) fold-left reductions the scalar
> fallback isn't implemented. But AVX512 has vpcompress that could be used
> to implement a more efficient sequence for a masked
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111889
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111820
--- Comment #12 from Hongtao.liu ---
Fixed in GCC14, not sure if we want to backport the patch.
If so, the patch needs to be adjusted since GCC13 doesn't support auto_mpz.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111874
--- Comment #1 from Hongtao.liu ---
For integer, We have _mm512_mask_reduce_add_epi32 defined as
extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
_mm512_mask_reduce_add_epi32 (__mmask16 __U, __m512i __A)
{
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111859
--- Comment #1 from Hongtao.liu ---
Could be reproduced with:
tar zxvf 521.tar.gz
cd 521
gfortran module_advect_em.fppizedi.f90 -S -O2 -march=cascadelake --param
vect-partial-vector-usage=2 -std=legacy -fconvert=big-endian
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111859
Bug ID: 111859
Summary: 521.wrf_r build failure with -O2 -march=cascadelake
--param vect-partial-vector-usage=2
Product: gcc
Version: 14.0
Status: UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111820
--- Comment #9 from Hongtao.liu ---
> But we end up here with niters_skip being INTEGER_CST and ..
>
> > 1421 || (!vect_use_loop_mask_for_alignment_p (loop_vinfo)
>
> possibly vect_use_loop_mask_for_alignment_p. Note
> LOOP_VINFO_PEELIN
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111820
--- Comment #7 from Hongtao.liu ---
(In reply to rguent...@suse.de from comment #6)
> On Mon, 16 Oct 2023, crazylht at gmail dot com wrote:
>
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111820
> >
> > --- Comment #5 from Hongtao.liu ---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111829
--- Comment #4 from Hongtao.liu ---
(In reply to Richard Biener from comment #2)
> You sink the conversion, so it would be PRE on the reverse graph. The
> transform doesn't really fit a particular pass I think.
The conversions also needs to be
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111829
--- Comment #3 from Hongtao.liu ---
(In reply to Richard Biener from comment #2)
> You sink the conversion, so it would be PRE on the reverse graph. The
> transform doesn't really fit a particular pass I think.
>
> Why does the problem persist
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111820
--- Comment #5 from Hongtao.liu ---
(In reply to Richard Biener from comment #3)
> for (unsigned i = 0; i != skipn - 1; i++)
> begin = wi::mul (begin, wi::to_wide (step_expr));
>
> (gdb) p skipn
> $5 = 4294967292
>
> niters i
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111820
--- Comment #4 from Hongtao.liu ---
> niters is 4294967292 in vect_update_ivs_after_vectorizer. Maybe the loop
> should terminate when begin is zero. But I wonder why we pass in 'niters'
> and then name it 'skip_niters' ...
>
It's coming from
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111829
--- Comment #1 from Hongtao.liu ---
ivtmp.23_31 = (unsigned long) b_24(D);
ivtmp.24_46 = (unsigned long) pa_26(D);
_50 = ivtmp.23_31 + 40;
[local count: 1063004408]:
# vsum_35 = PHI
# ivtmp.23_14 = PHI
# ivtmp.24_30 = PHI
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111829
Bug ID: 111829
Summary: Redudant register moves inside the loop
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
P
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768
--- Comment #10 from Hongtao.liu ---
> indeed (but I believe it did happen with Alder Lake already, by accident,
> with AVX512 on P-cores but not on E-cores).
AVX512 is physically fused off for Alderlake P-core, P-core and E-core share
the same
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768
--- Comment #4 from Hongtao.liu ---
I checked Alderlake's L1 cachesize and it is indeed 48, and L1 cachesize in
alderlake_cost is set to 32.
But then again, we have a lot of different platforms that share the same cost
and they may have differe
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111745
--- Comment #3 from Hongtao.liu ---
Fixed.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104610
--- Comment #22 from Hongtao.liu ---
For 64-byte memory comparison
int compare (const char* s1, const char* s2)
{
return __builtin_memcmp (s1, s2, 64) == 0;
}
We're generating
vmovdqu (%rsi), %ymm0
vpxorq (%rdi), %y
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111745
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111731
--- Comment #2 from Hongtao.liu ---
The original project is too complex for me to come up with a reproduction case,
I can help with gdb if additional information is needed.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111731
--- Comment #1 from Hongtao.liu ---
GCC11.3 is ok, GCC13.2 and later have the issue, I didn't verify GCC12.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111731
Bug ID: 111731
Summary: [13/14 regression] gcc_assert is hit at
libgcc/unwind-dw2-fde.c#L291
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111402
--- Comment #2 from Hongtao.liu ---
Adjust code in foo1, use < n instead of != n, the issue remains.
void
foo1 (v4di* __restrict a, v4di *b, int n)
{
for (int i = 0; i < n; i+=2)
{
a[i] = b[i];
a[i+1] = b[i+1];
}
}
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111402
Bug ID: 111402
Summary: Loop distribution fail to optimize memmove for
multiple consecutive moves within a loop
Product: gcc
Version: 14.0
Status: UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111354
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111306
--- Comment #8 from Hongtao.liu ---
Fixed in GCC14.1 GCC13.3 GCC12.4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111335
Hongtao.liu changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111306
--- Comment #4 from Hongtao.liu ---
A related PR111335 for fmaddcph , similar but not the same, PR111335 is due to
precision difference for complex _Float16 fma, fmaddcph a, b, c is not equal to
fmaddcph b, a, c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111335
Bug ID: 111335
Summary: fmaddpch seems not commutative for operands[1] and
operands[2] due to precision loss
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Sev
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111306
--- Comment #3 from Hongtao.liu ---
A patch is posted at
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629650.html
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111333
--- Comment #2 from Hongtao.liu ---
The test failed since GCC12 when the pattern is added
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111333
--- Comment #1 from Hongtao.liu ---
fmulcph/fmaddcph is commutative for operands[1] and operands[2], but
fcmulcph/fcmaddcph is not, since it's Complex conjugate operations.
Below change fixes the issue.
diff --git a/gcc/config/i386/sse.md b/gc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111333
Bug ID: 111333
Summary: Runtime failure for fcmulcph instrinsic
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111225
--- Comment #2 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #1)
> So reload thought CT_SPECIAL_MEMORY is always win for spilled_pesudo_p, but
> here Br should be a vec_dup:mem which doesn't match spilled_pseduo_p.
>
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111225
--- Comment #1 from Hongtao.liu ---
So reload thought CT_SPECIAL_MEMORY is always win for spilled_pesudo_p, but
here Br should be a vec_dup:mem which doesn't match spilled_pseduo_p.
case CT_SPECIAL_MEMORY:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111064
--- Comment #6 from Hongtao.liu ---
>
> [liuhongt@intel gather_emulation]$ ./gather.out
> ;./nogather_xmm.out;./nogather_ymm.out
> elapsed time: 1.75997 seconds for gather with 3000 iterations
> elapsed time: 2.42473 seconds for no_gather_
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19
--- Comment #5 from Hongtao.liu ---
Fixed in GCC14.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52
--- Comment #2 from Hongtao.liu ---
> With Zen3 -O2 generic lto pgo the regression is less noticeable (only 4%)
> https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=694.457.0
Not sure about this part
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111064
--- Comment #4 from Hongtao.liu ---
The loop is like
doublefoo (double* a, unsigned* b, double* c, int n)
{
double sum = 0;
for (int i = 0; i != n; i++)
{
sum += a[i] * c[b[i]];
}
return sum;
}
After disab
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19
--- Comment #3 from Hongtao.liu ---
> I see, we can add an alternative like "noavx2,avx2" to generate
> vmaskmovps/pd when avx2 is not available for integer.
It's better to change assmeble output as
27423 if (TARGET_AVX2)
27424return "vma
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866
--- Comment #8 from Hongtao.liu ---
(In reply to Uroš Bizjak from comment #7)
> (In reply to Hongtao.liu from comment #6)
> > > So, the compiler still expects vec_concat/vec_select patterns to be
> > > present.
> >
> > v2df foo_v2df (v2df x)
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866
--- Comment #6 from Hongtao.liu ---
(In reply to Uroš Bizjak from comment #4)
> (In reply to Hongtao.liu from comment #3)
> > in x86 backend expand_vec_perm_1, we always tries vec_merge frist for
> > !one_operand_p, expand_vselect_vconcat is only
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #3 f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111064
--- Comment #3 from Hongtao.liu ---
I didn't find the any regression when testing the patch.
Guess it's because my tester is full-copy run and the options are -march=native
-Ofast -flto -funroll-loop.
Let me verify it.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111062
--- Comment #1 from Hongtao.liu ---
(In reply to Zdenek Sojka from comment #0)
> Created attachment 55755 [details]
> reduced testcase
>
> Compiler output:
> $ x86_64-pc-linux-gnu-gcc -O -mavx10.1-256 -mavx512bw -mno-avx512f testcase.c
> cc1: w
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110966
--- Comment #4 from Hongtao.liu ---
(In reply to anlauf from comment #3)
> (In reply to Hongtao.liu from comment #2)
> > (In reply to Richard Biener from comment #1)
> > > I think matmul is fine with avx512f or avx, so requiring/using only the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110979
Bug ID: 110979
Summary: Miss-optimization for O2 fully masked loop on floating
point reduction.
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110966
--- Comment #2 from Hongtao.liu ---
(In reply to Richard Biener from comment #1)
> I think matmul is fine with avx512f or avx, so requiring/using only the base
> ISA level sounds fine to me.
Could be potential miss-optimization.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110966
Bug ID: 110966
Summary: should matmul_c8_avx512f be updated with
matmul_c8_x86-64-v4.
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110926
--- Comment #10 from Hongtao.liu ---
Fixed in GCC14.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921
--- Comment #11 from Hongtao.liu ---
(In reply to 罗勇刚(Yonggang Luo) from comment #10)
> (In reply to Hongtao.liu from comment #9)
>
> > > Without `-mbmi` option, gcc can not compile and all other three compiler
> > > can compile.
> >
> > As lo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921
--- Comment #9 from Hongtao.liu ---
> There is a redundant xor instrunction,
There's false dependence issue on some specific processors.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62011
> Without `-mbmi` option, gcc can not compile and all o
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110926
--- Comment #8 from Hongtao.liu ---
(In reply to Alexander Monakov from comment #7)
> Thanks for identifying the problem. Please don't rename the argument to
> 'op_mask' though: the parameter itself is not a mask, it's an eight-bit
> control wor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110926
--- Comment #6 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #5)
> I'm working on a patch.
int
-vpternlog_redundant_operand_mask (rtx *operands)
+vpternlog_redundant_operand_mask (rtx op_mask)
{
int mask = 0;
- int imm8 = XIN
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110926
--- Comment #5 from Hongtao.liu ---
I'm working on a patch.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921
--- Comment #7 from Hongtao.liu ---
(In reply to 罗勇刚(Yonggang Luo) from comment #6)
> MSVC also added, clang seems have optimization issue, but MSVC doesn't have
> that
No, I think what clang does is correct,
f(int, int):
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921
--- Comment #5 from Hongtao.liu ---
Maybe source code can be changed as
int f(int a, int b)
{
#ifdef __BMI__
return _tzcnt_u32 (a);
#else
return _bit_scan_forward (a);
#endif
}
But looks like clang/MSVC doesn't su
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921
--- Comment #4 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #3)
> But there's difference between TZCNT and BSF
>
> The key difference between TZCNT and BSF instruction is that TZCNT provides
> operand size as output when source o
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110762
--- Comment #23 from Hongtao.liu ---
(In reply to Uroš Bizjak from comment #22)
> It looks to me that partial vector half-float instructions have the same
> issue.
Yes, I'll take a look.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81904
--- Comment #7 from Hongtao.liu ---
>
> to .VEC_ADDSUB possibly loses exceptions (the vectorizer now directly
> creates .VEC_ADDSUB when possible).
Let's put it under -fno-trapping-math.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81904
--- Comment #5 from Hongtao.liu ---
(In reply to Richard Biener from comment #1)
> Hmm, I think the issue is we see
>
> f (__m128d x, __m128d y, __m128d z)
> {
> vector(2) double _4;
> vector(2) double _6;
>
>[100.00%]:
> _4 = x_2(D)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81904
--- Comment #4 from Hongtao.liu ---
(In reply to Richard Biener from comment #2)
> __m128d h(__m128d x, __m128d y, __m128d z){
> __m128d tem = _mm_mul_pd (x,y);
> __m128d tem2 = tem + z;
> __m128d tem3 = tem - z;
> return __builti
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #9
1 - 100 of 1141 matches
Mail list logo