[issue3451] Asymptotically faster divmod and str(long)

2008-09-08 Thread Pernici Mario

Pernici Mario <[EMAIL PROTECTED]> added the comment:

I have translated in C the algorithm for long division by
Burnikel and Ziegler (BZ), using the Python version fast_div.py
and the suggestions by Mark.

Here is a benchmark for divmod(p. q), p = 7**np, q = 5**nq
digits = q_digits = p_digits/2; the time taken by Python division
is normalized to 1
tests on Debian, BZ with Python-2.6b3
Pentium 1.60GHzAthlon XP 2600+ Athlon 64 3800+
digits  BZ  fast_div   BZ  fast_divBZ  fast_div
500 1.011.27   1.001.181.001.21
700 0.881.22   0.761.080.811.14
10000.821.17   0.721.040.761.10
20000.660.85   0.550.730.590.78
40000.510.62   0.430.520.450.56
1   0.320.38   0.310.370.330.39
2   0.240.25   0.230.250.250.27
10  0.140.14   0.110.110.120.12

BZ starts being faster than Python division around 2000 bits, apart
from two cases:
 for q with less than 4000 bits and p much smaller than q**2
 x_divrem is faster than divmod_pos;
 for p very much larger than q**2 x_divrem becomes
 again faster.
I put a bound in long_divrem to fall back to x_divrem in those cases,
based on tests on my laptop Pentium 1.60GHz.

The treatment of exceptions is very rudimentary;
I use a simple macro STUB_EXC to return NULL,
but without releasing memory.
Help for doing this properly is welcome.

Please find attached the diff to the longobject.c file in Python-2.6b3 .

Mario

--
nosy: +pernici
Added file: http://bugs.python.org/file11423/diff

___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue3451>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3944] faster long multiplication

2008-09-23 Thread Pernici Mario

New submission from Pernici Mario <[EMAIL PROTECTED]>:

In this patch x_mul(a, b) uses fewer bit operations for a != b,
asymptotically half of them.
On the three computers I tried the speed-up is around 5% for size=4
and it increases up to 45-60% just below the Karatsuba cutoff,
then it decreases a bit after this cutoff (on one computer the speed-up
is only 16% after KARATSUBA_CUTOFF=70, but raising the cutoff to 140,
for which with the current code the multiplication is also faster,
the speed-up is 45%).

--
files: longobject_diff
messages: 73624
nosy: pernici
severity: normal
status: open
title: faster long multiplication
type: performance
versions: Python 3.0
Added file: http://bugs.python.org/file11567/longobject_diff

___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue3944>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3944] faster long multiplication

2008-09-24 Thread Pernici Mario

Pernici Mario <[EMAIL PROTECTED]> added the comment:

Yes, I think that the speed-up is due to reducing the number of 
shifts and masks.

Changing PyLong_SHIFT to 16 would be complicated; for instance in 
v_iadd() carry could not be a digit of 16 bits anymore; writing code
specific for 64 bit machines would surely improve performance;
maybe with PyLong_SHIFT=30 few changes to the code would be needed?

I did not modify the case a = b.

I changed the documentation, which was wrong,
adding detailed bounds on carry
in the various steps to check that it does not overflow.
I corrected the wrong assertion (carry <= PyLong_MASK).

--
keywords: +patch
Added file: http://bugs.python.org/file11595/longobject1.diff

___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue3944>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3944] faster long multiplication

2008-09-29 Thread Pernici Mario

Pernici Mario <[EMAIL PROTECTED]> added the comment:

Mark, following your suggestions about using bigger integer types,
I added code to convert Python numbers to arrays of twodigits,
when a 64 bit integer type is supported, and for numbers with size
larger than 20; otherwise the code of the previous patch is used.
This 64 bit integer is used only inside multiplication, so no
modifications need to be made in other parts of the Python code.
Now with numbers with 300 decimal digits or more the speedup is
2x on 32 bit machine, 3x on 64 bit machine (50% and 2x respectively
for squaring).

There is a macro HAVE_INT64 to control if there is a 64 bit type;
the preprocessor instructions should be OK with gcc, but other
compilers might have a 64 bit type and not long long, so HAVE_INT64
is wrongly not defined and one falls back to multiplying arrays of
16 bit digits; these preprocessor instructions need to be fixed.

The speed difference for small integers is small;
here is a summary of some benchmarks on
Pentium M 1.6GHz, Athlon XP 2600+,
Athlon 64 X2 Dual Core 3800+, all with Debian;
speedup of this patch with respect to the current revision
(+ means the patch is faster):
In pybench, SimpleIntegerArithmetic: from -0.5% to +0.5%
SimpleLongArithmetic:  : from -1% to +7%
pystone: from +0.5% to +1.5%

Added file: http://bugs.python.org/file11653/longobject2.diff

___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue3944>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3451] Asymptotically faster divmod and str(long)

2008-10-13 Thread Pernici Mario

Pernici Mario <[EMAIL PROTECTED]> added the comment:

In this patch I added to the patch by Mark in issue 3944 the code
in the previous patch, modified to release memory in case of exceptions.

Benchmark for division on Athlon 64 3800+  with respect to Python3.0:
(1)  Python with patch 30bit.patch
(2)  Python with this patch
up to 1000 digits (1) and (2) perform in the same way
digits(1)(2)
2000  4x 5x
4000  4x 7x
1 4x 10x
2 4x 13x
104x 27x

--
keywords: +patch
Added file: http://bugs.python.org/file11773/longobject2.diff

___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue3451>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4258] Use 30-bit digits instead of 15-bit digits for Python integers.

2009-02-19 Thread Pernici Mario

Pernici Mario  added the comment:

The attached patch uses mul1 in long_mul in the version patched with
30bit_longdigit13+optimizations.patch

Comparison between these two patches on hp pavilion Q8200 2.33GHz

pybench  patch   new patch
SimpleIntegerArithmetic  89  85
other tests equal

pidigits_noprint 2000   998  947

--
nosy: +pernici
Added file: 
http://bugs.python.org/file13137/30bit_longdigit13+optimizations1.patch

___
Python tracker 
<http://bugs.python.org/issue4258>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3944] faster long multiplication

2009-03-22 Thread Pernici Mario

Pernici Mario  added the comment:

This patch comes from 30bit_longdigit13+optimizations1.patch in issue4258
with modification for the case of multiplication by 0; it passes
test_long.py and pidigits is a bit faster.

--
Added file: http://bugs.python.org/file13394/longobject_diff1

___
Python tracker 
<http://bugs.python.org/issue3944>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3451] Asymptotically faster divmod and str(long)

2009-03-30 Thread Pernici Mario

Pernici Mario  added the comment:

In this patch there is an implementation of the algorithm to convert
numbers in strings by recursively splitting the number in half, adapted
from Fredrik's div.py

--
Added file: http://bugs.python.org/file13496/longformat.diff

___
Python tracker 
<http://bugs.python.org/issue3451>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3451] Asymptotically faster divmod and str(long)

2009-03-30 Thread Pernici Mario

Pernici Mario  added the comment:

In this second patch to the above patch it is added the recursive
division algorithm by Burnikel and Ziegler (BZ) 
from longobject2.diff, unmodified,
to see the effect of the subquadratic algorithm; there is still a lot of
work to be done on it, as Mark pointed out.

Here is a benchmark on hp pavilion Q8200  2.33GHz
a = 7**n; compute str(a) N times
nNunpatched  patch1   patch2
100  10   0.25   0.25 0.25
200  10   0.63   0.64 0.63
300  10   1.19   1.22 1.23
400  10   1.87   1.74 1.75
500  10.27   0.24 0.24
1000 10.98   0.59 0.60
2000 1000 0.36   0.16 0.17
5000 1000 2.17   0.65 0.66
1100  0.86   0.20 0.19
2100  3.42   0.70 0.55
510   2.13   0.37 0.24
10   10.85   0.13 0.08
50   121 2.9  0.94
100  185 11   2.8
200  133944   8.4

str(n) in the first patch uses a quadratic algorithm, but asymptotically
it is almost 8 times faster than the unpatched version;
in the second patch the subquadratic algorithm starts showing 
after 1 digits.

--
Added file: http://bugs.python.org/file13497/longformat_BZ.diff

___
Python tracker 
<http://bugs.python.org/issue3451>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18005] faster modular exponentiation in some cases

2013-05-18 Thread Pernici Mario

New submission from Pernici Mario:

A trivial optimization can be made in ``pow(a, b, c)``
if ``b`` is even and ``c - a < a``

```
In [1]: c = (1 << 100) + 1 

In [2]: a = c - 1234567

In [3]: b = 2

In [4]: %timeit pow(a, b, c)
1 loops, best of 3: 3.03 s per loop

In [5]: %timeit pow(c - a if c - a < (a >> 10) else a, b, c)
1000 loops, best of 3: 287 us per loop
```

This optimization is probably done in GMP, since using gmpy.mpz
[5] is a bit slower than [4].

--
messages: 189504
nosy: pernici
priority: normal
severity: normal
status: open
title: faster modular exponentiation in some cases
type: performance
versions: Python 2.7

___
Python tracker 
<http://bugs.python.org/issue18005>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com