Hello Racketeers,

I'm slightly stuck with speeding up some calculations and the reason is
that I need to compute the index of highest set bit in a number. So for
1 it is 0, for 2 it is 1, for 8 3, 1023 9 and 1024 10 ...

The fastest Racket code I can come up with is as follows:

(begin-encourage-inline
  (define (highest-bit num)
    (let loop ((num num)
               (bit 0))
      (if (unsafe-fx> num 0)
          (loop (unsafe-fxrshift num 1)
                (unsafe-fx+ bit 1))
          bit))))

(Yes, it returns incorrect result for 0, but that must be checked
elsewhere as the result cannot be defined for 0 anyway).

However this is a single instruction operation on x86 (bsr) or two
instruction operation (rbit and clz if I am not mistaken) on ARM. Dunno
about PPC yet.

I looked at CS internals a bit and although there is a "name collision"
as the bsr mnemonics is used for ret (branch subroutine return?), it
should be fairly easy to add something like fxbsr (-> fixnum? fixnum?)
to Racket core.

I also experimented with x64asm package but the results were even worse
than the aforementioned tight loop (there is a lot of overhead in
define-λ! generated functions).

As the routine is typically called 40k-60k times per frame in my code
(real-time rendering) it could really make a difference.

Should I try to add it? How should it be named? What about BC?

And more generic question - aren't there more similar operations that
can be implemented efficiently on all supported CPUs that might be
useful in general? Yes, I am aiming at SIMD as many of you know :)
Especially with the expanded number of available FP registers to 8 on
64-bit x86 CS there surely is quite some potential to it.


Cheers,
Dominik

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/a2a4fe83-877d-d7ed-4812-bd628c128659%40trustica.cz.

Reply via email to