https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118480

--- Comment #14 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Steven Munroe from comment #11)
> And as you point out the instructions vslo/vsro/vsl/vsr only care about bits
> 121..127. Also older machines needed the byte splat for vsl/vsr.

vslq looks only at bits 121..127 as well.  So it is handier than vsl (it does
not need a splat to all lanes), but it does need p10 of course.

> So for P9 a single xxspltib should do for a vector quadword shift left/right
> by a constant. So how should the average library developer write code to get
> that (optimal)  result?

He/she should just write C code to do this (not even use a builtin
function), and trust the compiler will do the right thing (and the best
possible for the selected architecture version, etc.)

> Also the Intrinsic ref for vec_sl/vec_sr/vec_rl requires the shift count be
> the same size element as the shiftee. This shows up a bad code for
> doubleword shifts and P8/9/10 and quadword shifts on P10. It would be nice
> if the these intrinsics would also accept vector char as the shift count for
> all types of shiftee.

It has the shift count in the same size lane, nothing more, nothing less.  How
else could and should it work?

If you have examples where we generate not the best code, please attach it?
With what flags we need to reproduce etc., of course, you know the drill :-)

Reply via email to