I think it would make sense to leave the exact vector layout, like
vlen and lmul, to the caller.
Attached is an attempt to implement sin and cos vectorized so it
allows lmul values of m1 and m2, while using no more than a quarter of
the vector registers.
The function could live in libgcc and be used via a special pattern in
the machine description that
shows the exact list of clobbers.

Attachment: sin.S
Description: Binary data



Reply via email to