https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103462
--- Comment #1 from Hongtao.liu ---
Should it be done in vectorizer or ldist(just like memory op), or somewhere
else?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103462
--- Comment #2 from Hongtao.liu ---
bit clear and induction variable could be simplified to `& CONSTANT`
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103463
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95740
--- Comment #4 from Hongtao.liu ---
It can be fixed by
2 files changed, 4 insertions(+), 2 deletions(-)
gcc/config/i386/i386.c | 2 +-
gcc/config/i386/i386.h | 4 +++-
modified gcc/config/i386/i386.c
@@ -19194,7 +19194,7 @@ ix86_preferred_reloa
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95740
--- Comment #5 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #4)
> It can be fixed by
>
> 2 files changed, 4 insertions(+), 2 deletions(-)
> gcc/config/i386/i386.c | 2 +-
> gcc/config/i386/i386.h | 4 +++-
>
> modified gcc/config/
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103463
--- Comment #3 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #1)
> It should be fixed by
> https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585613.html
Hmm, it looks to be broken again.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103463
--- Comment #4 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #3)
> (In reply to Hongtao.liu from comment #1)
> > It should be fixed by
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585613.html
>
> Hmm, it looks to be
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103463
--- Comment #5 from Hongtao.liu ---
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index c88374c9d2b..4e9fae80479 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -11512,6 +11512,7 @@ (define_insn "*x86_64_sh
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103484
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100711
Hongtao.liu changed:
What|Removed |Added
Resolution|--- |FIXED
Status|NEW
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103463
Hongtao.liu changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103484
--- Comment #6 from Hongtao.liu ---
Fixed in GCC12.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103144
--- Comment #2 from Hongtao.liu ---
Another issue is for SLP, when trip count is small and loop is completely
unrolled. SLP failed to generate vlshr_optab.
#include
void
foo (uint64_t* __restrict pdst, uint64_t* psrc, uint64_t shift)
{
for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103519
--- Comment #2 from Hongtao.liu ---
get_mem_refs_of_builtin_call doesn't handle target-specific builtins.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95740
--- Comment #7 from Hongtao.liu ---
Fixed in GCC12.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103554
--- Comment #8 from Hongtao.liu ---
> but the x86 backend chooses to not let the vectorizer compare costs with
> different vector sizes but instead asks it to pick the first working
> solution from the vector of modes to consider (and in that or
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571
--- Comment #2 from Hongtao.liu ---
>
> Also, baz iz highly un-optimal for 32bit targets.
Yes, it needs to be fixed, note w/ -mavx512fp16 codegen for baz is optimal on
32-bit target, maybe related to vector_mode_supported_p, but then why code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571
--- Comment #5 from Hongtao.liu ---
(In reply to Uroš Bizjak from comment #4)
> (In reply to Hongyu Wang from comment #3)
>
> > So we may need to support V8HFmode in VALID_SSE2_REG_MODE if we don't want
> > to modify those function_args and fu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103554
--- Comment #10 from Hongtao.liu ---
Got it, thanks for your detail explanation, so there're 2 issues in this case,
first x86 target didn't choose vector size w/ smallest cost, second BB
vectorization with gaps at the end of a load is not suppor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571
--- Comment #8 from Hongtao.liu ---
(In reply to Uroš Bizjak from comment #6)
> (In reply to Hongtao.liu from comment #5)
>
> > There're several places in i386-expand.c which assume TARGET_AVX512FP16 for
> > case V8HF/V16HF/V32HF, if we want to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100738
Hongtao.liu changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571
--- Comment #10 from Hongtao.liu ---
(In reply to Uroš Bizjak from comment #9)
> (In reply to Hongtao.liu from comment #8)
> > (In reply to Uroš Bizjak from comment #6)
> > > (In reply to Hongtao.liu from comment #5)
> > >
> > > > There're seve
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571
--- Comment #15 from Hongtao.liu ---
(In reply to Uroš Bizjak from comment #12)
> (In reply to Hongtao.liu from comment #10)
>
> > Sure.
> Please find attached the complete patch that enables HF vector modes in
> Comment #11. The patch survives
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571
--- Comment #16 from Hongtao.liu ---
There're already testcases for vec_extract/vec_set/vec_duplicate, but those
testcases are written under TARGET_AVX512FP16, i'll make a copy of them and
test them w/o avx512fp16.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103554
--- Comment #12 from Hongtao.liu ---
(In reply to rguent...@suse.de from comment #11)
> On Tue, 7 Dec 2021, crazylht at gmail dot com wrote:
>
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103554
> >
> > --- Comment #10 from Hongtao.liu --
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571
--- Comment #17 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #16)
> There're already testcases for vec_extract/vec_set/vec_duplicate, but those
> testcases are written under TARGET_AVX512FP16, i'll make a copy of them and
> test th
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571
--- Comment #18 from Hongtao.liu ---
codegen for foo1/foo2 is suboptimal under -mavx2, i guess we can have
vec_setv16hf_0 and with vpblendw.
typedef _Float16 __v16hf __attribute__ ((__vector_size__ (32)));
typedef _Float16 __m256h __attribute__
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571
--- Comment #19 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #17)
> (In reply to Hongtao.liu from comment #16)
> > There're already testcases for vec_extract/vec_set/vec_duplicate, but those
> > testcases are written under TARGET_A
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571
--- Comment #20 from Hongtao.liu ---
V2HF/V4HF should also be restricted under AVX512FP16.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571
--- Comment #22 from Hongtao.liu ---
reply to Uroš Bizjak from comment #21)
> (In reply to Hongtao.liu from comment #19)
> > (In reply to Hongtao.liu from comment #17)
> > > (In reply to Hongtao.liu from comment #16)
> > > > There're already t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103682
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103682
--- Comment #5 from Hongtao.liu ---
Fixed in GCC12.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92658
--- Comment #25 from Hongtao.liu ---
(In reply to Uroš Bizjak from comment #5)
> Created attachment 47927 [details]
> Prototype patch v2
>
> A couple of typos fixed.
>
> Still doesn't vectorize v4qi->v4si, v2qi->v2di, v2hi->v2di and v4qi->v4di.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101846
--- Comment #9 from Hongtao.liu ---
(In reply to Andrew Pinski from comment #7)
> With just -mavx512f we produce a bunch of instructions (looking like we went
> to scalar mode) while LLVM is able to produce:
> foo(short __vector(16)):
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103462
--- Comment #5 from Hongtao.liu ---
Created attachment 52004
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52004&action=edit
Testes patch, wait for gcc13.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103462
--- Comment #6 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #5)
> Created attachment 52004 [details]
> Testes patch, wait for gcc13.
Add error in the patch to see if there's any change in gcc which can be
optimized, it turns out t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101796
Hongtao.liu changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103194
--- Comment #23 from Hongtao.liu ---
(In reply to Jakub Jelinek from comment #22)
> (In reply to Hongtao.liu from comment #15)
> > > Is the behavior well defined for n >= 64? I got
> > >
> > > foo.c:11:19: warning: left shift count >= width of
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103734
--- Comment #2 from Hongtao.liu ---
(In reply to Tamar Christina from comment #0)
> When using --param ipa-cp-eval-threshold=1 --param ipa-cp-unit-growth=20 on
> imagick the hot functions MorphologyApply and GetVirtualPixelsFromNexus get
> repla
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102944
--- Comment #6 from Hongtao.liu ---
(In reply to Martin Sebor from comment #5)
> I don't see any of the FAILs or XFAILs listed in comment #0 with cross
> compilers for any of the Targets. Can this report be resolved?
I thinks so, now we only ha
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103750
--- Comment #1 from Hongtao.liu ---
kmovw here is zero_extend, and at gimple level it's not redundant in loop.
_31 = MEM[(const __m256i_u * {ref-all})n_5];
_30 = MEM[(const __m256i_u * {ref-all})n_5 + 32B];
_28 = VIEW_CONVERT_EXPR<__v16hi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103750
--- Comment #2 from Hongtao.liu ---
Failed here
/* Allow propagations into a loop only for reg-to-reg copies, since
replacing one register by another shouldn't increase the cost. */
struct loop *def_loop = def_insn->bb ()->cfg_bb ()->
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103750
--- Comment #3 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #2)
> Failed here
>
> /* Allow propagations into a loop only for reg-to-reg copies, since
> replacing one register by another shouldn't increase the cost. */
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103750
--- Comment #4 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #3)
> (In reply to Hongtao.liu from comment #2)
> > Failed here
> >
> > /* Allow propagations into a loop only for reg-to-reg copies, since
> > replacing one regis
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103750
--- Comment #10 from Hongtao.liu ---
(In reply to Uroš Bizjak from comment #9)
> (In reply to Thiago Macieira from comment #0)
> > Testcase:
> ...
> > The assembly for this produces:
> >
> > vmovdqu16 (%rdi), %ymm1
> > vmo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98648
--- Comment #6 from Hongtao.liu ---
Fixed by r12-6071-g19dcecd963295b02b96c8cac57933657dbe3234a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98468
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #6 f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103750
--- Comment #11 from Hongtao.liu ---
(In reply to Thiago Macieira from comment #6)
> It got worse. Now I'm seeing:
>
> .L807:
> vmovdqu16 (%rsi), %ymm2
> vmovdqu16 32(%rsi), %ymm3
> vpcmpuw $6, %ymm0, %ymm2,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103750
--- Comment #12 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #11)
> (In reply to Thiago Macieira from comment #6)
> > It got worse. Now I'm seeing:
> >
> > .L807:
> > vmovdqu16 (%rsi), %ymm2
> > vmovdqu16
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103750
--- Comment #13 from Hongtao.liu ---
Created attachment 52031
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52031&action=edit
untested patch.
Attached patch can optimize #c0 to
vmovdqu (%rdi), %ymm1
vmovdqu16 32(%r
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103750
Hongtao.liu changed:
What|Removed |Added
Attachment #52031|0 |1
is obsolete|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103750
--- Comment #15 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #14)
> Created attachment 52032 [details]
> update patch
>
> Update patch, Now gcc can generate optimal code
>
current fix add define_insn_and_splitter for 3 things:
1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102080
--- Comment #3 from Hongtao.liu ---
(In reply to H.J. Lu from comment #2)
> It is caused by r12-2679.
Mine.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
--- Comment #13 from Hongtao.liu ---
fold shulfps to vec_perm_exp, but still 2 shulfps are generated.
__m128 f (__m128 a, __m128 b)
{
vector(4) float _3;
vector(4) float _5;
vector(4) float _6;
;; basic block 2, loop depth 0
;;pred:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43147
--- Comment #20 from Hongtao.liu ---
Fixed in GCC12, now gcc generate optimal codes.
main:
.LFB532:
.cfi_startproc
subq$8, %rsp
.cfi_def_cfa_offset 16
movaps .LC0(%rip), %xmm0
callprintv
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102080
--- Comment #4 from Hongtao.liu ---
diff --git a/test.c.032t.ccp1 b/test.c.033t.forwprop1
index 5b18739..c6f0587 100644
--- a/test.c.032t.ccp1
+++ b/test.c.033t.forwprop1
@@ -31,11 +31,12 @@ void EncodedFromDisplay ()
__m256 __trans_tmp_11;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102080
--- Comment #9 from Hongtao.liu ---
(In reply to Andrew Pinski from comment #8)
> That is the mask is a vector mode still for these patterns according to the
> internals doc.
> Rather than the scalar mode you have:
> (match_operand: 1 "register_
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102080
--- Comment #11 from Hongtao.liu ---
Created attachment 51363
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51363&action=edit
Proposed patch
I'm testing this patch.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101472
--- Comment #5 from Hongtao.liu ---
Fixed in GCC12, backport to GCC11 and GCC10.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
--- Comment #15 from Hongtao.liu ---
(In reply to Andrew Pinski from comment #14)
> (In reply to Hongtao.liu from comment #13)
> > fold shulfps to vec_perm_exp, but still 2 shulfps are generated.
> >
> > __m128 f (__m128 a, __m128 b)
> > {
> >
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
--- Comment #16 from Hongtao.liu ---
typedef int v4si __attribute__ ((vector_size(16)));
v4si f(v4si a, v4si b) {
v4si a1 = __builtin_shufflevector (a, a, 2, 3 ,1 ,0);
v4si b1 = __builtin_shufflevector (b, a, 2, 3 ,1 ,0);
return a1 *
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
--- Comment #18 from Hongtao.liu ---
(In reply to Andrew Pinski from comment #17)
> (In reply to Hongtao.liu from comment #16)
> > typedef int v4si __attribute__ ((vector_size(16)));
> >
> > v4si f(v4si a, v4si b) {
> > v4si a1 = __builtin_s
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101796
--- Comment #4 from Hongtao.liu ---
(In reply to Andrew Pinski from comment #3)
> Combine is able to do the combine but it fails as it does not match:
> Trying 10, 9 -> 14:
>10: r92:HI=0x3
> 9: r91:V32HI=vec_duplicate(r92:HI)
> REG
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51838
--- Comment #2 from Hongtao.liu ---
(In reply to Andrew Pinski from comment #1)
> We do get slightly better now:
> xorl%eax, %eax
> movq%rdi, %r8
> xorl%edi, %edi
> addq%rsi, %rax
> adcq
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97756
--- Comment #7 from Hongtao.liu ---
(In reply to Patrick Palka from comment #3)
> Perhaps related to this PR: On x86_64, the following basic wrapper around
> int128 addition
>
> __uint128_t f(__uint128_t x, __uint128_t y) { return x + y; }
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102133
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102133
--- Comment #2 from Hongtao.liu ---
I successfully reproduce error related to 32-bit SPARC libgcc
But failed to configure for target mcore, i didn't find any reference in
https://gcc.gnu.org/install/specific.html
--target=mcore results in
***
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102133
--- Comment #3 from Hongtao.liu ---
static inline void
set_rtl (tree t, rtx x)
{
gcc_checking_assert (!x
|| !(TREE_CODE (t) == SSA_NAME || is_gimple_reg (t))
|| (use_register_for_decl (t)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102133
--- Comment #4 from Hongtao.liu ---
>
> and it hit REG_P (XEXP (x, 1)), XEXP (x, 1) is invalid for subreg, so
> set_rtl here doesn't accept subreg?
typo, it hit gcc_assert that if X is not REG, it must be CONCAT or PARALLEL,
but here is SUBR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102133
--- Comment #5 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #4)
> >
> > and it hit REG_P (XEXP (x, 1)), XEXP (x, 1) is invalid for subreg, so
> > set_rtl here doesn't accept subreg?
>
> typo, it hit gcc_assert that if X is not R
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102133
--- Comment #6 from Hongtao.liu ---
The difference of insn sequence is like
good one:
(insn 5 4 6 (clobber (reg/v:DF 153))
"/scratch/jmyers/glibc/many12/src/gcc/libgcc/libgcc2.c":1948:1 -1
(nil))
(insn 6 5 7 (set (subreg:SI (reg/v:DF 153)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102133
--- Comment #7 from Hongtao.liu ---
Since we also allow something like (concat:(subreg) (subreg)), should we also
allow subreg outside?
gcc_checking_assert (!x
|| !(TREE_CODE (t) == SSA_NAME || is_gimple_reg (t))
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102133
--- Comment #8 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #7)
> Since we also allow something like (concat:(subreg) (subreg)), should we
> also allow subreg outside?
>
>gcc_checking_assert (!x
> || !(TRE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102133
--- Comment #12 from Hongtao.liu ---
Fixed in GCC12.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102154
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102154
--- Comment #6 from Hongtao.liu ---
Reproduced with a simple testcase
float
foo (long a)
{
union{long a;
float b[2];}c;
c.a = a;
return c.b[1];
}
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102154
--- Comment #7 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #6)
> Reproduced with a simple testcase
>
>
> float
> foo (long a)
> {
> union{long a;
> float b[2];}c;
> c.a = a;
> return c.b[1];
> }
(subreg:SF (reg:DI) 4)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102154
--- Comment #8 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #7)
> (In reply to Hongtao.liu from comment #6)
> > Reproduced with a simple testcase
> >
> >
> > float
> > foo (long a)
> > {
> > union{long a;
> > float b[2];}c;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102154
--- Comment #9 from Hongtao.liu ---
>
> (define_insn "movsf_hardfloat"
> [(set (match_operand:SF 0 "nonimmediate_operand"
>"=!r, f, v, wa,m, wY,
> Z, m, wa, !r,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102154
--- Comment #10 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #9)
> >
> > (define_insn "movsf_hardfloat"
> > [(set (match_operand:SF 0 "nonimmediate_operand"
> > "=!r, f, v, wa,m, wY,
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102154
--- Comment #11 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #10)
> (In reply to Hongtao.liu from comment #9)
> > >
> > > (define_insn "movsf_hardfloat"
> > > [(set (match_operand:SF 0 "nonimmediate_operand"
> > >"=!r,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102154
--- Comment #12 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #10)
> (In reply to Hongtao.liu from comment #9)
> > >
> > > (define_insn "movsf_hardfloat"
> > > [(set (match_operand:SF 0 "nonimmediate_operand"
> > >"=!r,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102166
--- Comment #4 from Hongtao.liu ---
Because _tile_loadd is implemented as embedded assembly plus macros, if
__AMX_TILE__ is removed, no error will be reported if the user does not use the
-mamx option, So this macro is added here, but obviously
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102166
--- Comment #7 from Hongtao.liu ---
(In reply to Thiago Macieira from comment #5)
> (In reply to Hongtao.liu from comment #4)
> > Because _tile_loadd is implemented as embedded assembly plus macros, if
> > __AMX_TILE__ is removed, no error will
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102166
--- Comment #8 from Hongtao.liu ---
(In reply to Thiago Macieira from comment #6)
> > I suggest doing as Clang did and make it an intrinsic.
>
> Or even a __builtin_ia32_markamxtile(); intrinsic, which produces the error
> if misused and does a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102166
--- Comment #10 from Hongtao.liu ---
>
> Anyway, I suggest at a minimum removing the #define check. There's little
> harm in having no diagnostic on misuse: misuses are probably going to be
> seen when testing. Until GCC is able to generate AMX
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102182
Bug ID: 102182
Summary: Runtime error for
gcc.dg/torture/fp-int-convert-float16.c
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Keywords: wrong-code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102182
Hongtao.liu changed:
What|Removed |Added
CC||hjl.tools at gmail dot com
--- Comment #1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102182
--- Comment #2 from Hongtao.liu ---
Reproduced case.
#include
int
main (void)
{
static volatile unsigned int ivin, ivout;
static volatile _Float16 fv1, fv2;
ivin = ((unsigned int)1);
fv1 = ((unsigned int)1);
fv2 = ivin;
ivout = fv2;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102182
--- Comment #3 from Hongtao.liu ---
during pass_expand we got
(debug_insn 24 23 0 (debug_marker) "test1.c":10:3 -1
(nil))
;; fv2.1_3 ={v} fv2;
(insn 25 24 0 (set (reg:HF 84 [ fv2.1_3 ])
(mem/v/c:HF (symbol_ref:SI ("fv2.1") [flags
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102182
--- Comment #4 from Hongtao.liu ---
After emit libcall in convert_to_mode, it failed maybe_emit_unop_insn, so all
insns deleted, but from here is already overrided, it seems to be a bug.
if (icode != CODE_FOR_nothing)
{
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102186
--- Comment #3 from Hongtao.liu ---
A patch is posted at
https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578746.html
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102154
--- Comment #15 from Hongtao.liu ---
(In reply to Segher Boessenkool from comment #14)
> (In reply to Jonathan Wakely from comment #13)
> > Is this also the cause of several libstdc++ FAILs on ppc64le?
>
>
> Yes.
>
> I have asked for reversio
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102211
Bug ID: 102211
Summary: ICE introduced by r12-3277
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102211
--- Comment #1 from Hongtao.liu ---
But it's ok for
float
foo (float a, long b)
{
union{float a[2];
long b;}c;
c.b = b;
return c.a[0];
}
foo:
fmv.w.x fa0,a0
ret
Which means movement between gpr and float reg is allo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102211
--- Comment #2 from Hongtao.liu ---
According to *movsi_internal and *movdi_64bit, SImode, and DImode can be placed
into FP_REGS, but in riscv_hard_regno_mode_ok, SImode/DImode is not allowed to
be allocated as FP_REGS, the mismatch here caues t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88473
--- Comment #8 from Hongtao.liu ---
(In reply to Andrew Pinski from comment #7)
> The UNSPEC_MASKOP ones are still there.
>
> PR 93885 is the same issue.
void test(void* data, void* data2)
{
__m128i v = _mm_load_si128((__m128i const*)data);
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88473
--- Comment #9 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #8)
> (In reply to Andrew Pinski from comment #7)
> > The UNSPEC_MASKOP ones are still there.
> >
> > PR 93885 is the same issue.
>
> void test(void* data, void* data2)
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82139
--- Comment #2 from Hongtao.liu ---
(In reply to Andrew Pinski from comment #1)
> It is worse on the trunk:
> .L2:
> movdqu (%rdi), %xmm1
> movdqu (%rdi), %xmm0
> addq$16, %rdi
> paddd %xmm3, %xmm1
>
701 - 800 of 1358 matches
Mail list logo