On 11/17/2017 04:09 PM, Palmer Dabbelt wrote:
From: Kito Cheng <kito.ch...@gmail.com>
2017-11-17 Kito Cheng <kito.ch...@gmail.com>
* longlong.h [__riscv] (__umulsidi3): Define.
[__riscv] (umul_ppmm) Likewise.
[__riscv] (__muluw3) Likewise.
Apparently the point of this is that by defining __mulsi3/__muldi3 as an
extended asm, we get better register allocation, and hence better
performance. It would be nice if this info was provided when a patch
was submitted, so that we don't have to figure it out ourselves.
I see one minor issue though
+ (w0) = __muluw3 (__ll_lowpart (__x1), __ll_B) \
+ + __ll_lowpart (__x0); \
The multiply here is just a shift and should be written as a shift by
(W_TYPE_SIZE / 2).
You copied this code from the default definition which uses *, and
assumes that the compiler will optimize the multiply into a shift.
However, because __muluw3 expands into a call to __mulsi3/__muldi3 for a
risc-v part with no multiply, the optimization will not take place here,
and you end up with one extra unnecessary multiply operation.
Otherwise this looks OK.
Jim