[issue3451] Asymptotically faster divmod and str(long)
Pernici Mario <[EMAIL PROTECTED]> added the comment: I have translated in C the algorithm for long division by Burnikel and Ziegler (BZ), using the Python version fast_div.py and the suggestions by Mark. Here is a benchmark for divmod(p. q), p = 7**np, q = 5**nq digits = q_digits = p_digits/2; the time taken by Python division is normalized to 1 tests on Debian, BZ with Python-2.6b3 Pentium 1.60GHzAthlon XP 2600+ Athlon 64 3800+ digits BZ fast_div BZ fast_divBZ fast_div 500 1.011.27 1.001.181.001.21 700 0.881.22 0.761.080.811.14 10000.821.17 0.721.040.761.10 20000.660.85 0.550.730.590.78 40000.510.62 0.430.520.450.56 1 0.320.38 0.310.370.330.39 2 0.240.25 0.230.250.250.27 10 0.140.14 0.110.110.120.12 BZ starts being faster than Python division around 2000 bits, apart from two cases: for q with less than 4000 bits and p much smaller than q**2 x_divrem is faster than divmod_pos; for p very much larger than q**2 x_divrem becomes again faster. I put a bound in long_divrem to fall back to x_divrem in those cases, based on tests on my laptop Pentium 1.60GHz. The treatment of exceptions is very rudimentary; I use a simple macro STUB_EXC to return NULL, but without releasing memory. Help for doing this properly is welcome. Please find attached the diff to the longobject.c file in Python-2.6b3 . Mario -- nosy: +pernici Added file: http://bugs.python.org/file11423/diff ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue3451> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3944] faster long multiplication
New submission from Pernici Mario <[EMAIL PROTECTED]>: In this patch x_mul(a, b) uses fewer bit operations for a != b, asymptotically half of them. On the three computers I tried the speed-up is around 5% for size=4 and it increases up to 45-60% just below the Karatsuba cutoff, then it decreases a bit after this cutoff (on one computer the speed-up is only 16% after KARATSUBA_CUTOFF=70, but raising the cutoff to 140, for which with the current code the multiplication is also faster, the speed-up is 45%). -- files: longobject_diff messages: 73624 nosy: pernici severity: normal status: open title: faster long multiplication type: performance versions: Python 3.0 Added file: http://bugs.python.org/file11567/longobject_diff ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue3944> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3944] faster long multiplication
Pernici Mario <[EMAIL PROTECTED]> added the comment: Yes, I think that the speed-up is due to reducing the number of shifts and masks. Changing PyLong_SHIFT to 16 would be complicated; for instance in v_iadd() carry could not be a digit of 16 bits anymore; writing code specific for 64 bit machines would surely improve performance; maybe with PyLong_SHIFT=30 few changes to the code would be needed? I did not modify the case a = b. I changed the documentation, which was wrong, adding detailed bounds on carry in the various steps to check that it does not overflow. I corrected the wrong assertion (carry <= PyLong_MASK). -- keywords: +patch Added file: http://bugs.python.org/file11595/longobject1.diff ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue3944> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3944] faster long multiplication
Pernici Mario <[EMAIL PROTECTED]> added the comment: Mark, following your suggestions about using bigger integer types, I added code to convert Python numbers to arrays of twodigits, when a 64 bit integer type is supported, and for numbers with size larger than 20; otherwise the code of the previous patch is used. This 64 bit integer is used only inside multiplication, so no modifications need to be made in other parts of the Python code. Now with numbers with 300 decimal digits or more the speedup is 2x on 32 bit machine, 3x on 64 bit machine (50% and 2x respectively for squaring). There is a macro HAVE_INT64 to control if there is a 64 bit type; the preprocessor instructions should be OK with gcc, but other compilers might have a 64 bit type and not long long, so HAVE_INT64 is wrongly not defined and one falls back to multiplying arrays of 16 bit digits; these preprocessor instructions need to be fixed. The speed difference for small integers is small; here is a summary of some benchmarks on Pentium M 1.6GHz, Athlon XP 2600+, Athlon 64 X2 Dual Core 3800+, all with Debian; speedup of this patch with respect to the current revision (+ means the patch is faster): In pybench, SimpleIntegerArithmetic: from -0.5% to +0.5% SimpleLongArithmetic: : from -1% to +7% pystone: from +0.5% to +1.5% Added file: http://bugs.python.org/file11653/longobject2.diff ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue3944> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3451] Asymptotically faster divmod and str(long)
Pernici Mario <[EMAIL PROTECTED]> added the comment: In this patch I added to the patch by Mark in issue 3944 the code in the previous patch, modified to release memory in case of exceptions. Benchmark for division on Athlon 64 3800+ with respect to Python3.0: (1) Python with patch 30bit.patch (2) Python with this patch up to 1000 digits (1) and (2) perform in the same way digits(1)(2) 2000 4x 5x 4000 4x 7x 1 4x 10x 2 4x 13x 104x 27x -- keywords: +patch Added file: http://bugs.python.org/file11773/longobject2.diff ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue3451> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4258] Use 30-bit digits instead of 15-bit digits for Python integers.
Pernici Mario added the comment: The attached patch uses mul1 in long_mul in the version patched with 30bit_longdigit13+optimizations.patch Comparison between these two patches on hp pavilion Q8200 2.33GHz pybench patch new patch SimpleIntegerArithmetic 89 85 other tests equal pidigits_noprint 2000 998 947 -- nosy: +pernici Added file: http://bugs.python.org/file13137/30bit_longdigit13+optimizations1.patch ___ Python tracker <http://bugs.python.org/issue4258> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3944] faster long multiplication
Pernici Mario added the comment: This patch comes from 30bit_longdigit13+optimizations1.patch in issue4258 with modification for the case of multiplication by 0; it passes test_long.py and pidigits is a bit faster. -- Added file: http://bugs.python.org/file13394/longobject_diff1 ___ Python tracker <http://bugs.python.org/issue3944> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3451] Asymptotically faster divmod and str(long)
Pernici Mario added the comment: In this patch there is an implementation of the algorithm to convert numbers in strings by recursively splitting the number in half, adapted from Fredrik's div.py -- Added file: http://bugs.python.org/file13496/longformat.diff ___ Python tracker <http://bugs.python.org/issue3451> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3451] Asymptotically faster divmod and str(long)
Pernici Mario added the comment: In this second patch to the above patch it is added the recursive division algorithm by Burnikel and Ziegler (BZ) from longobject2.diff, unmodified, to see the effect of the subquadratic algorithm; there is still a lot of work to be done on it, as Mark pointed out. Here is a benchmark on hp pavilion Q8200 2.33GHz a = 7**n; compute str(a) N times nNunpatched patch1 patch2 100 10 0.25 0.25 0.25 200 10 0.63 0.64 0.63 300 10 1.19 1.22 1.23 400 10 1.87 1.74 1.75 500 10.27 0.24 0.24 1000 10.98 0.59 0.60 2000 1000 0.36 0.16 0.17 5000 1000 2.17 0.65 0.66 1100 0.86 0.20 0.19 2100 3.42 0.70 0.55 510 2.13 0.37 0.24 10 10.85 0.13 0.08 50 121 2.9 0.94 100 185 11 2.8 200 133944 8.4 str(n) in the first patch uses a quadratic algorithm, but asymptotically it is almost 8 times faster than the unpatched version; in the second patch the subquadratic algorithm starts showing after 1 digits. -- Added file: http://bugs.python.org/file13497/longformat_BZ.diff ___ Python tracker <http://bugs.python.org/issue3451> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18005] faster modular exponentiation in some cases
New submission from Pernici Mario: A trivial optimization can be made in ``pow(a, b, c)`` if ``b`` is even and ``c - a < a`` ``` In [1]: c = (1 << 100) + 1 In [2]: a = c - 1234567 In [3]: b = 2 In [4]: %timeit pow(a, b, c) 1 loops, best of 3: 3.03 s per loop In [5]: %timeit pow(c - a if c - a < (a >> 10) else a, b, c) 1000 loops, best of 3: 287 us per loop ``` This optimization is probably done in GMP, since using gmpy.mpz [5] is a bit slower than [4]. -- messages: 189504 nosy: pernici priority: normal severity: normal status: open title: faster modular exponentiation in some cases type: performance versions: Python 2.7 ___ Python tracker <http://bugs.python.org/issue18005> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com