On Apr 21, 12:04 pm, [email protected] wrote: > Using inline ASM in Python sources isn't an option.
Except when it is. :) There's a tiny amount of inline assembler in the sources already: see Python/pymath.c and Python/ceval.c. Not surprisingly, there's some in the ctypes module as well. There *are* places where it's very tempting to add a little (optional, removable, carefully tested, etc.) assembler to the long implementation: one main reason that using 30-bit digits for longs is slower (for some benchmarks) than using 15-bit digits on 32-bit platforms is that there's no way to tell C to do a 64-bit by 32-bit division, in cases where you know (from understanding of the algorithm) that the quotient fits into 32 bits. On x86, replacing just two of the divisions in Objects/longsobject.c by the appropriate 'divl' inline assembler got me 10% speedups on some benchmarks. Mark -- http://mail.python.org/mailman/listinfo/python-list
