[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

Phoebe Wang via Phabricator via cfe-commits Thu, 09 Jun 2022 09:04:17 -0700

pengfei added inline comments.


================
Comment at: llvm/test/Analysis/CostModel/X86/fptoi_sat.ll:852
+; SSE2-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %f16u1 
= call i1 @llvm.fptoui.sat.i1.f16(half undef)
+; SSE2-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %f16s8 
= call i8 @llvm.fptosi.sat.i8.f16(half undef)
+; SSE2-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %f16u8 
= call i8 @llvm.fptoui.sat.i8.f16(half undef)
----------------
LuoYuanke wrote:
> It seems the cost is reduced in general. Is it because we pass/return f16 by 
> xmm register?
No. It's because we don't have cost model for `f16`. I added some in D127386 to 
address this.


================
Comment at: llvm/test/CodeGen/MIR/X86/inline-asm-registers.mir:31
   ; CHECK-LABEL: name: test
-  ; CHECK: INLINEASM &foo, 0 /* attdialect */, 4390922 /* regdef:GR64 */, def 
$rsi, 4390922 /* regdef:GR64 */, def dead $rdi,
-    INLINEASM &foo, 0, 4390922, def $rsi, 4390922, def dead $rdi, 2147549193, 
killed $rdi, 2147483657, killed $rsi, 12, implicit-def dead early-clobber 
$eflags
+  ; CHECK: INLINEASM &foo, 0 /* attdialect */, 4456458 /* regdef:GR64 */, def 
$rsi, 4456458 /* regdef:GR64 */, def dead $rdi,
+    INLINEASM &foo, 0, 4456458, def $rsi, 4456458, def dead $rdi, 2147549193, 
killed $rdi, 2147483657, killed $rsi, 12, implicit-def dead early-clobber 
$eflags
----------------
LuoYuanke wrote:
> Why f16 patch affect this test case? There is no fp instruction in this test 
> case.
I this it's newly added `FR16` that affects all number the other number 
register class. We met the problem when enabling FP16 too.


================
Comment at: llvm/test/CodeGen/X86/atomic-non-integer.ll:253
+; X64-SSE-NEXT:    movzwl (%rdi), %eax
+; X64-SSE-NEXT:    pinsrw $0, %eax, %xmm0
+; X64-SSE-NEXT:    retq
----------------
LuoYuanke wrote:
> I notice X86-SSE1 return by GPR. Should we also return by GPR for X64-SSE?
No. The result in X86-SSE in UB. We support the emulation on SSE2 and later.


================
Comment at: llvm/test/CodeGen/X86/avx512-insert-extract.ll:2307
+; SKX-NEXT:    vmovd %ecx, %xmm0
+; SKX-NEXT:    vcvtph2ps %xmm0, %xmm0
+; SKX-NEXT:    vmovss %xmm0, %xmm0, %xmm0 {%k2} {z}
----------------
LuoYuanke wrote:
> Is code less efficient than previous code? Why previous code still works 
> without convert half to float?
Yes. The previous code using `i16` for FP16. Improved, thanks!


================
Comment at: llvm/test/CodeGen/X86/callbr-asm-bb-exports.ll:20
 ; CHECK-NEXT: t22: ch,glue = CopyToReg t17, Register:i32 %5, t8
-; CHECK-NEXT: t30: ch,glue = inlineasm_br t22, TargetExternalSymbol:i64'xorl 
$0, $0; jmp ${1:l}', MDNode:ch<null>, TargetConstant:i64<8>, 
TargetConstant:i32<2293769>, Register:i32 %5, TargetConstant:i64<13>, 
TargetBlockAddress:i64<@test, %fail> 0, TargetConstant:i32<12>, Register:i32 
$df, TargetConstant:i32<12>, Register:i16 $fpsw, TargetConstant:i32<12>, 
Register:i32 $eflags, t22:1
+; CHECK-NEXT: t30: ch,glue = inlineasm_br t22, TargetExternalSymbol:i64'xorl 
$0, $0; jmp ${1:l}', MDNode:ch<null>, TargetConstant:i64<8>, 
TargetConstant:i32<2359305>, Register:i32 %5, TargetConstant:i64<13>, 
TargetBlockAddress:i64<@test, %fail> 0, TargetConstant:i32<12>, Register:i32 
$df, TargetConstant:i32<12>, Register:i16 $fpsw, TargetConstant:i32<12>, 
Register:i32 $eflags, t22:1
 
----------------
LuoYuanke wrote:
> Why this test is affacted? Is it caused by calling convention change?
No. It's caused by newly added `FR16` register class.


================
Comment at: llvm/test/CodeGen/X86/fmf-flags.ll:115
-; X64-NEXT:    movzwl %di, %edi
-; X64-NEXT:    callq __gnu_h2f_ieee@PLT
 ; X64-NEXT:    mulss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
----------------
LuoYuanke wrote:
> Does __gnu_h2f_ieee retrun from xmm?
There does not exist a `__gnu_h2f_ieee` on X86 before. It's ARM/AArch64 
specific.


================
Comment at: llvm/test/CodeGen/X86/fpclamptosat.ll:569
 ; CHECK-NEXT:    cvttss2si %xmm0, %rax
 ; CHECK-NEXT:    ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; CHECK-NEXT:    movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000
----------------
LuoYuanke wrote:
> I'm curious why there is 1 more compare in this patch. 
It's an optimization implemented by D111976. We don't meet the requirment that 
`isOperationLegalOrCustom`. It's not easy to solve because we need to check the 
promoted type instead. I'll leave it as is.


================
Comment at: llvm/test/CodeGen/X86/fpclamptosat.ll:776
+; CHECK-NEXT:    cmovael %eax, %ecx
+; CHECK-NEXT:    ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; CHECK-NEXT:    movl $2147483647, %edx # imm = 0x7FFFFFFF
----------------
LuoYuanke wrote:
> Ditto.
The same as above.


================
Comment at: llvm/test/CodeGen/X86/fpclamptosat_vec.ll:605
+; CHECK-NEXT:    .cfi_def_cfa_offset 80
+; CHECK-NEXT:    movss %xmm2, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
+; CHECK-NEXT:    movss %xmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
----------------
LuoYuanke wrote:
> Is the vector <4 x half> split to 4 scalar and pass by xmm? What's the ABI 
> for vector half? Is there any case that test the scenario that run out of 
> register and pass parameter through stack?
Good question! Previously, I discussed with GCC folks we won't support vector 
in emulation. I expected the FE with pass whole vector through stack. So a 
vector in IR is illegal to ABI and can be splited.
But seems GCC passes it by vector register. https://godbolt.org/z/a67rMhTW6
I'll double confirm with GCC folks.


================
Comment at: llvm/test/CodeGen/X86/fptosi-sat-scalar.ll:2138
+; X64-NEXT:    ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; X64-NEXT:    movl $255, %eax
+; X64-NEXT:    cmovael %ecx, %eax
----------------
LuoYuanke wrote:
> It seems less efficient than previous code on NAN, zero handling, but we can 
> improve later.
Yes. Added FIXMEs.


================
Comment at: llvm/test/CodeGen/X86/half.ll:946
+; CHECK-I686-NEXT:    calll __extendhfsf2
+; CHECK-I686-NEXT:    fstps {{[0-9]+}}(%esp)
+; CHECK-I686-NEXT:    movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
----------------
LuoYuanke wrote:
> Why the x87 instruction is generated?
On 32 bit, float and double are passed by x87 register.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

Reply via email to