On Tue, Nov 4, 2008 at 6:33 PM, Tim Peters <[EMAIL PROTECTED]> wrote: > Whenever > two digits are multiplied, the code intends to cast (at least) one of > them to "twodigits" first (if you ever see a spot where this doesn't > happen, it's a bug). "twodigits" is typedef'ed to C long. C89 > guarantees that a long holds at least 32 bits.
There are indeed one or two spots that seem to be missing a cast, for example the line "rem -= hi*n" in inplace_divrem1. I've fixed all those I found, in the issue 4258 patch. >> And for 32x32 -> 64, can't this simply be replaced by "(uint64_t)i * j", > 1. That's C99, not C89, and therefore less portable. Understood; my thought was to use uint32_t and uint64_t for digit and twodigits when available (with longs being stored in base 2**30), falling back to the current ushort/ulong/base 2**15 otherwise. On Unix, autoconf makes this easy by providing macros like 'AC_TYPE_INT32_T', which makes sure that int32_t is defined to be an integer type of exact width 32, when available. > 2. On platforms that support it, this is at least 64x64->64 multiplication, > potentially much more expensive than the 32x32->64 (or 31x31->62?) > flavor you /intend/ to move to. > > 3. There's no way to know exactly what compilers will do with this short of > staring at generated code. Yes---I've done the staring for gcc, so I now have precisely *one* data point, which is that various flavours of gcc on x86/x86_64 *are* clever enough to turn (uint64_t)my_uint32 * my_other_uint32 into the appropriate HW instruction. Unfortunately I don't have easy access to other compilers or platforms right now. :-( Am still working on the benchmarking, but I'm definitely seeing speedup on gcc/x86---between 10% and 100% depending on the operations. > FYI, in a previous life working in speech recognition, under > Microsoft's Visual Studio 6 the only way we found to get at the > Pentium's 32x32->64 HW ability efficiently was via using inline > assembler. Urk. We definitely don't want to go there. Though I guess this is how packages like gmp and GP/Pari manage. > C89). That's why it's impossible here to write portable code in C > that's also efficient. Even what Python does now is vulnerable on the But maybe it's possible to write portable code (by providing fallbacks) that turns out to be efficient on the majority of mainstream systems? The extent of the ifdef'ery in the patch is really rather small: one (compound) #ifdef in longintrepr.h for defining digit, twodigits, stwodigits etc, and a couple more for the places where digits are read and written in marshal.c. >> I agree that very-long-integer optimizations probably don't really belong in >> Python, > > Depends in part on whether Python can attract as many obsessed > maintainers and porters for such gonzo algorithms as GMP attracts ;-) Well, you can count me in. :) Mark _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com