[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)

Ulrich Weigand via cfe-commits Fri, 04 Apr 2025 12:23:07 -0700

uweigand wrote:

> Moved the comment "Promoting the result to i64...so use the default 
> expansion" that was present in SystemZISelLowering.cpp into this method, but 
> don't quite understand it fully. Is this talking about promoting to signed 
> i64?


This is about whether we can (and should) implement a fp->u32 conversion via 
fp->s64->u32.   If the input is in the valid range for an u32, this will always 
result in the correct output.  However, the question is what happens if the 
input is *outside* that range.  The z196 instruction follows the IEEE rules and 
will generate an invalid operation exception, and I think we intended to match 
that semantics with the expanded code for z10.  The fp->s64->u32 expansion 
would not generate an invalid operation exception if the input value is inside 
the s64 but outside the u32 range, that's why we chose not to use it.

However, now that I'm looking at it again, this decision seems questionable for 
at least two reasons:
- the LLVM IR explicitly marks that case as undefined: "The `fptoui` 
instruction converts its floating-point operand into the nearest (rounding 
towards zero) unsigned integer value. If the value cannot fit in ty2, the 
result is a poison value."
- even the more complex expansion still doesn't always signal invalid operation 
(it does for values > UINT_MAX, but not for (all) negative values)

Maybe we should re-think this, but then again z10 doesn't really matter anymore 
at this point, so it probably doesn't make much sense to change codegen now ...

> FP_TO_INT:
> 
> Z10 unsigned: Expanding fp-> i32/i64 seems to work best to do without first 
> doing the custom extension to f32 in the f16 input case. 
> TargetLowering::expandFP_TO_UINT() can then look at the types and convert to 
> fp_to_sint and avoid the longer sequence used for float.

Right, that makes sense (there is again the question of out-of-range inputs, 
but given the above discussion, I think this fine).

> For i128 which is not a legal type so this needs to be handled similarly in 
> LowerOperationWrapper() instead.

I see that the LowerOperationWrapper still emits i128 operations.  (E.g. you 
expand a f16->i128 into a f16->f32 and f32->i128)   But that routine is called 
because of the illegal input type i128, so I understand it must not leave any 
operations with the illegal type; rather, it should complete the expansion 
fully.   This is annoying since for legal i128 we do the expansion in 
LowerOperation while for illegal i128 we do it in LowerOperationWrapper.   This 
was the same problem for the atomics, where I avoided code duplication by 
implementing the expansion in LowerOperationWrapper and then simply calling 
that routine from LowerOperation.

> Unfortunately it is not possible to return and select "Expand or LibCall", so 
> for the i128 case it seems to be necessary to emit the libcall using the 
> makeLibCall() method.

I see, this is unfortunate, but seems the only option.

> INT_TO_FP:
> 
> Z10 unsigned: Inlined the Promotion of i32 via f32. i64 first gets done on 
> f32, and then expanded.

Why can't you still do the promotion via the Promote action for i32?   Then you 
wouldn't have to duplicate the common-code case ...
 
>     * FCOPYSIGN: test for non-existing library function removed, but handling 
> remains for
>       the intrinsic that is used.

Ah, this implementation is quite inefficient.  There should be no extend/trunc 
libcalls needed at all: the sign bit in f16 is in the same place as f32 or f64, 
so the actual CPSDR instruction should simply work on f16 too.



https://github.com/llvm/llvm-project/pull/109164
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)

Reply via email to