Hello Racketeers,
I'm slightly stuck with speeding up some calculations and the reason is
that I need to compute the index of highest set bit in a number. So for
1 it is 0, for 2 it is 1, for 8 3, 1023 9 and 1024 10 ...
The fastest Racket code I can come up with is as follows:
(begin-encourage-inline
(define (highest-bit num)
(let loop ((num num)
(bit 0))
(if (unsafe-fx> num 0)
(loop (unsafe-fxrshift num 1)
(unsafe-fx+ bit 1))
bit))))
(Yes, it returns incorrect result for 0, but that must be checked
elsewhere as the result cannot be defined for 0 anyway).
However this is a single instruction operation on x86 (bsr) or two
instruction operation (rbit and clz if I am not mistaken) on ARM. Dunno
about PPC yet.
I looked at CS internals a bit and although there is a "name collision"
as the bsr mnemonics is used for ret (branch subroutine return?), it
should be fairly easy to add something like fxbsr (-> fixnum? fixnum?)
to Racket core.
I also experimented with x64asm package but the results were even worse
than the aforementioned tight loop (there is a lot of overhead in
define-λ! generated functions).
As the routine is typically called 40k-60k times per frame in my code
(real-time rendering) it could really make a difference.
Should I try to add it? How should it be named? What about BC?
And more generic question - aren't there more similar operations that
can be implemented efficiently on all supported CPUs that might be
useful in general? Yes, I am aiming at SIMD as many of you know :)
Especially with the expanded number of available FP registers to 8 on
64-bit x86 CS there surely is quite some potential to it.
Cheers,
Dominik
--
You received this message because you are subscribed to the Google Groups
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/racket-users/a2a4fe83-877d-d7ed-4812-bd628c128659%40trustica.cz.