2007/4/9, Joe Buck <[EMAIL PROTECTED]>:
On Mon, Apr 09, 2007 at 09:47:07AM -0700, Andrew Pinski wrote:
> On 4/9/07, J.C. Pizarro <[EMAIL PROTECTED]> wrote:
> >#include <stddef.h>
> >
> >void *__allocate_array_OptionA(size_t num, size_t size) { // 1st best
> > unsigned long long tmp = (unsigned long long)size * num;
> > if (tmp >= 0x0000000080000000ULL) tmp=~size_t(0);
> > return operator new[](tmp);
> >}
>
> First this just happens to be the best for x86, what about PPC or
> really any embedded target where people are more concern about code
> size than say x86.
It's nowhere close to best for x86. But to get the best, you'd need
to use assembly language, and the penalty in time is one instruction:
insert a jnc (jump short if no carry), with the prediction flag set
as "taken", after the mull instruction. This would jump over code
to load all-ones into the result. You have to multiply, and the processor
tells you if there's an overflow.
A general approach would be to have an intrinsic for unsigned multiply
with saturation, have a C fallback, and add an efficient implemention of
the intrinsic on a per-target basis.
To optimize even more the x86, it still has to use:
1. Use imul instead of mul because it's little bit faster in cycles.
2. Use jns/js (sign's conditional jump) instead of jnc/jc (carry's
conditional jump).
3. To modify the C-preprocessor and/or C/C++ compiler for:
#if argument X is a constant then
use this code specific of constant X
#else if argument Y is not a constant then
use this code specific of non-constant Y
#else
use this general code
#else
The 3rd option is too complex but powerful like nearly Turing machine ;)
J.C. Pizarro