Re: [Python-Dev] PEP 4000 to explicitly declare we won't be doing a Py3k style compatibility break again?

2014-08-18 Thread Mark Dickinson
[Moderately off-topic]

On Sun, Aug 17, 2014 at 3:39 AM, Steven D'Aprano 
wrote:

> I used to refer to Python 4000 as the hypothetical compatibility break
> version. Now I refer to Python 5000.
>

I personally think it should be Python 500, or Py5M.  When we come to
create the mercurial branch, that should of course, following tradition, be
called p5ym.

-- 
Mark
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Symmetry arguments for API expansion

2018-03-12 Thread Mark Dickinson
On Mon, Mar 12, 2018 at 4:49 PM, Raymond Hettinger <
raymond.hettin...@gmail.com> wrote:

> What is the proposal?
> * Add an is_integer() method to int(), Decimal(), Fraction(), and Real().
> Modify Rational() to provide a default implementation.
>

>From the issue discussion, it sounds to me as though the OP would be
content with adding is_integer to int and Fraction (leaving the decimal
module and the numeric tower alone).


> Starting point: Do we need this?
> * We already have a simple, traditional, portable, and readable way to
> make the test:  int(x) == x
>

As already pointed out in the issue discussion, this solution isn't
particularly portable (it'll fail for infinities and nans), and can be
horribly inefficient in the case of a Decimal input with large exponent:

In [1]: import decimal
In [2]: x = decimal.Decimal('1e9')
In [3]: %timeit x == int(x)
1.42 s ± 6.27 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [4]: %timeit x == x.to_integral_value()
230 ns ± 2.03 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)

* In the context of ints, the test x.is_integer() always returns True.
> This isn't very useful.
>

It's useful in the context of duck typing, which I believe is a large part
of the OP's point. For a value x that's known to be *either* float or int
(which is not an uncommon situation), it makes x.is_integer() valid without
needing to know the specific type of x.

* It conflicts with a design goal for the decimal module to not invent new
> functionality beyond the spec unless essential for integration with the
> rest of the language.  The reasons included portability with other
> implementations and not trying to guess what the committee would have
> decided in the face of tricky questions such as whether
> Decimal('1.01').is_integer()
> should return True when the context precision is only three decimal places
> (i.e. whether context precision and rounding traps should be applied before
> the test and whether context flags should change after the test).
>

I don't believe there's any ambiguity here. The correct behaviour looks
clear: the context isn't used, no flags are touched, and the method returns
True if and only if the value is finite and an exact integer. This is
analogous to the existing is-sNaN, is-signed, is-finite, is-zero,
is-infinite tests, none of which are affected by (or affect) context.

-- 
Mark
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Symmetry arguments for API expansion

2018-03-13 Thread Mark Dickinson
On Mon, Mar 12, 2018 at 9:18 PM, Tim Peters  wrote:

> [Guido]
> >  as_integer_ratio() seems mostly cute (it has Tim Peters all
> > over it),
>
> Nope!  I had nothing to do with it.  I would have been -0.5 on adding
> it had I been aware at the time.
>

Looks like it snuck into the float type as part of the fractions.Fraction
work in https://bugs.python.org/issue1682 . I couldn't find much related
discussion; I suspect that the move was primarily for optimization (see
https://github.com/python/cpython/commit/3ea7b41b5805c60a05e697211d0bfc14a62a19fb).
Decimal.as_integer_ratio was added here: https://bugs.python.org/issue25928
 .

I do have significant uses of `float.as_integer_ratio` in my own code, and
wouldn't enjoy seeing it being deprecated/ripped out, though I guess I'd
cope.

Some on this thread have suggested that things like is_integer and
as_integer_ratio should be math module functions. Any suggestions for how
that might be made to work? Would we special-case the types we know about,
and handle only those (so the math module would end up having to know about
the fractions and decimal modules)? Or add a new magic method (e.g.,
__as_integer_ratio__) for each case we want to handle, like we do for
math.__floor__, math.__trunc__ and math.__ceil__? Or use some form of
single dispatch, so that custom types can register their own handlers? The
majority of current math module functions simply convert their arguments to
a float, so a naive implementation of math.is_integer in the same style
wouldn't work: it would give incorrect results for a non-integral Decimal
instance that ended up getting rounded to an integral value by the float
conversion.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecating float.is_integer()

2018-03-21 Thread Mark Dickinson
I'd prefer to see `float.is_integer` stay. There _are_ occasions when one
wants to check that a floating-point number is integral, and on those
occasions, using `x.is_integer()` is the one obvious way to do it. I don't
think the fact that it can be misused should be grounds for deprecation.

As far as real uses: I didn't find uses of `is_integer` in our code base
here at Enthought, but I did find plenty of places where it _could_
reasonably have been used, and where something less readable like `x % 1 ==
0` was being used instead. For evidence that it's generally useful: it's
already been noted that the decimal module uses it internally. The mpmath
package defines its own "isint" function and uses it in several places: see
https://github.com/fredrik-johansson/mpmath/blob/2858b1000ffdd8596defb50381dcb83de2b6/mpmath/ctx_mp_python.py#L764.
MPFR also has an mpfr_integer_p predicate:
http://www.mpfr.org/mpfr-current/mpfr.html#index-mpfr_005finteger_005fp.

A concrete use-case: suppose you wanted to implement the beta function (
https://en.wikipedia.org/wiki/Beta_function) for real arguments in Python.
You'll likely need special handling for the poles, which occur only for
some negative integer arguments, so you'll need an is_integer test for
those. For small positive integer arguments, you may well want the accuracy
advantage that arises from computing the beta function in terms of
factorials (giving a correctly-rounded result) instead of via the log of
the gamma function. So again, you'll want an is_integer test to identify
those cases. (Oddly enough, I found myself looking at this recently as a
result of the thread about quartile definitions: there are links between
the beta function, the beta distribution, and order statistics, and the
(k-1/3)/(n+1/3) expression used in the recommended quartile definition
comes from an approximation to the median of a beta distribution with
integral parameters.)

Or, you could look at the SciPy implementation of the beta function, which
does indeed do the C equivalent of is_integer in many places:
https://github.com/scipy/scipy/blob/11509c4a98edded6c59423ac44ca1b7f28fba1fd/scipy/special/cephes/beta.c#L67

In sum: it's an occasionally useful operation; there's no other obvious,
readable spelling of the operation that does the right thing in all cases,
and it's _already_ in Python! In general, I'd think that deprecation of an
existing construct should not be done lightly, and should only be done when
there's an obvious and significant benefit to that deprecation. I don't see
that benefit here.

-- 
Mark
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecating float.is_integer()

2018-03-21 Thread Mark Dickinson
On Wed, Mar 21, 2018 at 8:49 PM, David Mertz  wrote:

> For example, this can be true (even without reaching inf):
>
> >>> x.is_integer()
> True
> >>> (math.sqrt(x**2)).is_integer()
> False
>

If you have a moment to share it, I'd be interested to know what value of
`x` you used to achieve this, and what system you were on. This can't
happen under IEEE 754 arithmetic.

-- 
Mark
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Using more specific methods in Python unit tests

2014-02-16 Thread Mark Dickinson
On Sun, Feb 16, 2014 at 12:22 AM, Nick Coghlan  wrote:

> The practical benefits of this kind of change in the test suite are
> also highly dubious, because they *only help if the test fails at some
> point in the future*. At that point, whoever caused the test to fail
> will switch into debugging mode, and a couple of relevant points
> apply:
>

One place where those points don't apply so cleanly is when the test
failure is coming from continuous integration and can't easily be
reproduced locally (e.g., because there's a problem on a platform you don't
have access to, or because it's some kind of threading-related intermittent
failure that's exacerbated by the timing conditions on a particular
machine).  In those situations, an informative error message can easily
save significant debugging time.

Count me as +1 on the test updates, provided they're done carefully.  (And
all those I've looked at from Serhiy do indeed look careful.)

-- 
Mark
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Language Summit notes

2014-04-17 Thread Mark Dickinson
On Wed, Apr 16, 2014 at 11:26 PM, Antoine Pitrou wrote:

> What does this mean exactly? Under OS X and Linux, Python is typically
> installed by default.


Under OS X, at least, I think there are valid reasons to not want to use
the system-supplied Python.  On my up-to-date OS X 10.9.2 machine, I see
Python 2.7.5, NumPy 1.6.2, Matplotlib 1.1.1 and Twisted 12.2.0.  For at
least Matplotlib and NumPy, those versions are pretty old (mid 2012), and
I'd be wary of updating them on the *system* Python: I have no idea what I
might or might not break.

-- 
Mark
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] is the concept of 'reference ownership' no long applicable in Python 3.4?

2014-04-17 Thread Mark Dickinson
On Thu, Apr 17, 2014 at 4:34 PM, Jianfeng Mao wrote:

>  I noticed the following changes in the C API manuals from 3.3.5 (and
> earlier versions) to 3.4. I don’t know if these changes are deliberate and
> imply that we C extension developers no longer need to care about
> ‘reference ownership’ because of some improvements in 3.4. Could anyone
> clarify it?
>

AFAIK there's been no deliberate change to the notion of reference
ownership.  Moreover, any such change would break existing C extensions, so
it's highly unlikely that anything's changed here, behaviour-wise.

This looks like a doc build issue: when I build the documentation locally
for the default branch, I still see the expected "Return value: New
reference." lines.  Maybe something went wrong with refcounts.dat or the
Sphinx refcounting extension when building the 3.4 documentation?  Larry:
any ideas?

Mark
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] is the concept of 'reference ownership' no long applicable in Python 3.4?

2014-04-17 Thread Mark Dickinson
On Thu, Apr 17, 2014 at 5:33 PM, Mark Dickinson  wrote:

> This looks like a doc build issue: when I build the documentation locally
> for the default branch, I still see the expected "Return value: New
> reference." lines.
>

Opened http://bugs.python.org/issue21286 for this issue.

-- 
Mark
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Informal educator feedback on PEP 572 (was Re: 2018 Python Language Summit coverage, last part)

2018-07-01 Thread Mark Dickinson
On Fri, Jun 22, 2018 at 7:28 PM, Chris Barker via Python-Dev <
python-dev@python.org> wrote:

>
> But once it becomes a more common idiom, students will see it in the wild
> pretty early in their path to learning python. So we'll need to start
> introducing it earlier than later.
>
> I think this reflects that the "smaller" a language is, the easier it is
> to learn.
>

For what it's worth, Chris's thoughts are close to my own here. I and
several of my colleagues teach week-long Python courses for Enthought. The
target audience is mostly scientists and data scientists (many of whom are
coming from MATLAB or R or IDL or Excel/VBA or some other development
environment, but some of whom are new to programming altogether), and our
curriculum is Python, NumPy, SciPy, Pandas, plus additional course-specific
bits and pieces (scikit-learn, NLTK, seaborn, statsmodels, GUI-building,
Cython, HPC, etc., etc.).

There's a constant struggle to keep the Python portion of the course large
enough to be coherent and useful, but small enough to allow time for the
other topics. To that end, we separate the Python piece of the course into
"core topics" that are essential for the later parts, and "advanced topics"
that can be covered if time allows, or if we get relevant questions. I
can't see a way that the assignment expression wouldn't have to be part of
the core topics. async stuff only appears in async code, and it's easy to
compartmentalize; in contrast, I'd expect that once the assignment
expression took hold we'd be seeing it in a lot of code, independent of the
domain.

And yes, I too see enough confusion with "is" vs == already, and don't
relish the prospect of teaching := in addition to those.

That's with my Python-teaching hat on. With my Python-developer hat on, my
thoughts are slightly different, but that's off-topic for this thread, and
I don't think I have anything to say that hasn't already been said many
times by others, so I'll keep quiet about that bit. :-)

-- 
Mark
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Semantics of __int__(), __index__()

2013-03-31 Thread Mark Dickinson
On Sun, Mar 31, 2013 at 2:29 PM, Mark Shannon  wrote:

> class Int1(int):
> def __init__(self, val=0):
> print("new %s" % self.__class__)
>
> class Int2(Int1):
> def __int__(self):
> return self
>
> and two instances
> i1 = Int1()
> i2 = Int2()
>
> we get the following behaviour:
>
> >>> type(int(i1))
> 
>
> I would have expected 'Int1'
>

Wouldn't that remove the one obvious way to get an 'int' from an 'Int1'?


> 1. Should type(int(x)) be exactly int, or is any subclass OK?
> 2. Should type(index(x)) be exactly int, or is any subclass OK?
> 3. Should int(x) be defined as int_check(x.__int__())?
> 4. Should operator.index(x) be defined as index_check(x.__index__())?
>

For (1), I'd say yes, it should be exactly an int, so my answer to (3) is
no.

As written, int_check would do the wrong thing for bools, too:  I
definitely want int(True) to be 1, not True.

For (2) and (4), it's not so clear.  Are there use-cases for an __index__
return value that's not directly of type int?  I can't think of any offhand.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Semantics of __int__(), __index__()

2013-04-02 Thread Mark Dickinson
On Tue, Apr 2, 2013 at 1:44 AM, Nick Coghlan  wrote:

> int() and operator.index() are both type coercion calls to produce true
> Python integers - they will never return a subclass, and this is both
> deliberate and consistent with all the other builtin types that accept an
> instance of themselves as input to the constructor.
>

That's good to hear.


> There's code in the slot wrappers so that if you return a non-int object
> from either __int__ or __index__, then the interpreter will complain about
> it, and if you return a subclass, it will be stripped back to just the base
> class.
>

Can you point me to that code?  All I could find was PyLong_Check calls (I
was looking for PyLong_CheckExact).

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Semantics of __int__(), __index__()

2013-04-02 Thread Mark Dickinson
On Tue, Apr 2, 2013 at 8:07 AM, Mark Dickinson  wrote:

> On Tue, Apr 2, 2013 at 1:44 AM, Nick Coghlan  wrote:
>
>> There's code in the slot wrappers so that if you return a non-int object
>> from either __int__ or __index__, then the interpreter will complain about
>> it, and if you return a subclass, it will be stripped back to just the base
>> class.
>>
>
> Can you point me to that code?  All I could find was PyLong_Check calls (I
> was looking for PyLong_CheckExact).
>

And indeed:

iwasawa:Objects mdickinson$ /opt/local/bin/python3.3
Python 3.3.0 (default, Sep 29 2012, 08:16:19)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> class A:
... def __int__(self):
... return True
... def __index__(self):
... return False
...
>>> a = A()
>>> int(a)
True
>>> import operator; operator.index(a)
False

Which means I have to do int(int(a)) to get the actual integer value.  Grr.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Semantics of __int__(), __index__()

2013-04-02 Thread Mark Dickinson
On Tue, Apr 2, 2013 at 9:33 AM, Mark Shannon  wrote:

>
> Hence my original question: what *should* the semantics be?
>
>
I like Nick's answer to that: int *should* always return something of exact
type int.  Otherwise you're always left wondering whether you have to do
"int(int(x))", or perhaps even "int(int(int(x)))", to be absolutely sure of
getting an int.

The question is whether / how to fix the current behaviour, given that it
doesn't conform to those ideal semantics.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Semantics of __int__(), __index__()

2013-04-02 Thread Mark Dickinson
On Tue, Apr 2, 2013 at 9:58 AM, Maciej Fijalkowski  wrote:

>
> My 2 cents here is that which one is called seems to be truly random.
> Try looking into what builtin functions call (for example list.pop
> calls __int__, who knew)
>

That sounds like a clear bug to me.  It should definitely be using
__index__.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Semantics of __int__(), __index__()

2013-04-02 Thread Mark Dickinson
On Tue, Apr 2, 2013 at 10:02 AM, Mark Dickinson  wrote:

> On Tue, Apr 2, 2013 at 9:58 AM, Maciej Fijalkowski wrote:
>
>>
>> My 2 cents here is that which one is called seems to be truly random.
>> Try looking into what builtin functions call (for example list.pop
>> calls __int__, who knew)
>>
>
> That sounds like a clear bug to me.  It should definitely be using
> __index__.
>

Ah, and I see it *is* using `__index__` in Python 3; just not in Python
2.7.  It may be one of those Python 2 bugs that's not worth fixing because
the fix would do more harm (in the form of breakage of existing code) than
good.

-- 
Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Semantics of __int__(), __index__()

2013-04-03 Thread Mark Dickinson
On Wed, Apr 3, 2013 at 12:17 PM, Nick Coghlan  wrote:

> Perhaps we should start emitting a DeprecationWarning for int subclasses
> returned from __int__ and __index__ in 3.4?
>
+1.  Sounds good to me.

> (I like the idea of an explicit error over implicit conversion to the base
> type, so deprecation of subtypes makes sense as a way forward. We should
> check the other type coercion methods, too.)
>
Agreed on both points.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Semantics of __int__(), __index__()

2013-04-06 Thread Mark Dickinson
On Fri, Apr 5, 2013 at 6:34 PM, Terry Jan Reedy  wrote:

> 2. int(rational): for floats, Fractions, and Decimals, this returns the
> integral part, truncating toward 0. Decimal and float have __int__ methods.
> Fractions, to my surprise, does not, so int must use __floor__ or __round__
> as a backup.
>

It uses __trunc__, which is supposed to be the unambiguous "Yes I really
want to throw away the fractional part and risk losing information"
replacement for __int__.  int() will try __int__ first, and then __trunc__,
as per PEP 3141.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 450 adding statistics module

2013-08-15 Thread Mark Dickinson
The PEP and code look generally good to me.

I think the API for median and its variants deserves some wider discussion:
the reference implementation has a callable 'median', and variant callables
'median.low', 'median.high', 'median.grouped'.  The pattern of attaching
the variant callables as attributes on the main callable is unusual, and
isn't something I've seen elsewhere in the standard library.  I'd like to
see some explanation in the PEP for why it's done this way.  (There was
already some discussion of this on the issue, but that was more centered
around the implementation than the API.)

I'd propose two alternatives for this:  either have separate functions
'median', 'median_low', 'median_high', etc., or have a single function
'median' with a "method" argument that takes a string specifying
computation using a particular method.  I don't see a really good reason to
deviate from standard patterns here, and fear that users would find the
current API surprising.

Mark



On Thu, Aug 15, 2013 at 2:25 AM, Steven D'Aprano wrote:

> Hi all,
>
> I have raised a tracker item and PEP for adding a statistics module to the
> standard library:
>
> http://bugs.python.org/**issue18606 
>
> http://www.python.org/dev/**peps/pep-0450/
>
> There has been considerable discussion on python-ideas, which is now
> reflected by the PEP. I've signed the Contributor Agreement, and submitted
> a patch containing updated code and tests. The tests aren't yet integrated
> with the test runner but are runnable manually.
>
> Can I request that people please look at this issue, with an aim to ruling
> on the PEP and (hopefully) adding the module to 3.4 before feature freeze?
> If it is accepted, I am willing to be primary maintainer for this module in
> the future.
>
>
> Thanks,
>
>
> --
> Steven
> __**_
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/**mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/**mailman/options/python-dev/**
> dickinsm%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 450 adding statistics module

2013-08-15 Thread Mark Dickinson
On Thu, Aug 15, 2013 at 2:25 AM, Steven D'Aprano wrote:

> Can I request that people please look at this issue, with an aim to ruling
> on the PEP and (hopefully) adding the module to 3.4 before feature freeze?
> If it is accepted, I am willing to be primary maintainer for this module in
> the future.
>

Bah.  I seem to have forgotten how to not top-post.  Apologies.  Please
ignore the previous message, and I'll try again...

The PEP and code look generally good to me.

I think the API for median and its variants deserves some wider discussion:
the reference implementation has a callable 'median', and variant callables
'median.low', 'median.high', 'median.grouped'.  The pattern of attaching
the variant callables as attributes on the main callable is unusual, and
isn't something I've seen elsewhere in the standard library.  I'd like to
see some explanation in the PEP for why it's done this way.  (There was
already some discussion of this on the issue, but that was more centered
around the implementation than the API.)

I'd propose two alternatives for this:  either have separate functions
'median', 'median_low', 'median_high', etc., or have a single function
'median' with a "method" argument that takes a string specifying
computation using a particular method.  I don't see a really good reason to
deviate from standard patterns here, and fear that users would find the
current API surprising.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 450 adding statistics module

2013-08-15 Thread Mark Dickinson
On Thu, Aug 15, 2013 at 2:08 PM, Steven D'Aprano wrote:

>
> - Each scheme ended up needing to be a separate function, for ease of both
> implementation and testing. So I had four private median functions, which I
> put inside a class to act as namespace and avoid polluting the main
> namespace. Then I needed a "master function" to select which of the methods
> should be called, with all the additional testing and documentation that
> entailed.
>

That's just an implementation issue, though, and sounds like a minor
inconvenience to the implementor rather than anything serious;  I don't
think that that should dictate the API that's used.

- The API doesn't really feel very Pythonic to me. For example, we write:
>

And I guess this is subjective:  conversely, the API you're proposing
doesn't feel Pythonic to me. :-)  I'd like the hear the opinion of other
python-dev readers.

Thanks for the detailed replies.  Would it be possible to put some of this
reasoning into the PEP?

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 450 adding statistics module

2013-08-15 Thread Mark Dickinson
On Thu, Aug 15, 2013 at 6:48 PM, Ryan  wrote:

> For the naming, how about changing median(callable) to median.regular?
> That way, we don't have to deal with a callable namespace.
>

Hmm.  That sounds like a step backwards to me:  whatever the API is, a
simple "from statistics import median; m = median(my_data)" should still
work in the simple case.

Mark





>
> Steven D'Aprano  wrote:
>
>> On 15/08/13 21:42, Mark Dickinson wrote:
>>
>>> The PEP and code look generally good to me.
>>>
>>> I think the API for median and its variants deserves some wider discussion:
>>> the reference implementation has a callable 'median', and variant callables
>>> 'median.low', 'median.high', 'median.grouped'.  The pattern of attaching
>>> the variant callables as attributes on the main callable is unusual, and
>>> isn't something I've seen elsewhere in the standard library.  I'd like to
>>> see some explanation in the PEP for why it's done this way.  (There was
>>> already some discussion of this on the issue, but that was more centered
>>> around the implementation than the API.)
>>>
>>> I'd propose two alternatives for this:  either have separate functions
>>> 'median', 'median_low', 'median_high', etc., or have a single function
>>> 'median' with a "method" argument that takes a string specifying
>>> computation using a particular method.  I don't see a really good reason to
>>> deviate from standard patterns here, and fear that users would find the
>>> current API surprising.
>>
>>
>> Alexander Belopolsky has convinced me (off-list) that my current 
>> implementation is better changed to a more conservative one of a callable 
>> singleton instance with methods implementing the alternative computations. 
>> I'll have something like:
>>
>>
>> def _singleton(cls):
>> return cls()
>>
>>
>> @_singleton
>> class median:
>> def __call__(self, data):
>> ...
>> def low(self, data):
>> ...
>> ...
>>
>>
>> In my earlier stats module, I had a single median function that took a 
>> argument to choose between alternatives. I called it "scheme":
>>
>> median(data, scheme="low")
>>
>> R uses parameter
>> called "type" to choose between alternate calculations, not for median as we 
>> are discussing, but for quantiles:
>>
>> quantile(x, probs ... type = 7, ...).
>>
>> SAS also uses a similar system, but with different numeric codes. I rejected 
>> both "type" and "method" as the parameter name since it would cause 
>> confusion with the usual meanings of those words. I eventually decided 
>> against this system for two reasons:
>>
>> - Each scheme ended up needing to be a separate function, for ease of both 
>> implementation and testing. So I had four private median functions, which I 
>> put inside a class to act as namespace and avoid polluting the main 
>> namespace. Then I needed a "master function" to select which of the methods 
>> should be called, with all the additional testing and documentation that 
>> entailed.
>>
>> - The API doesn't really feel very Pythonic to me. For example, we write:
>>
>> mystring.rjust(width)
>> dict.items()
>>
>> rather than mystring.justify(width,
>> "right") or dict.iterate("items"). So I think individual methods is a better 
>> API, and one which is more familiar to most Python users. The only 
>> innovation (if that's what it is) is to have median a callable object.
>>
>>
>> As far as having four separate functions, median, median_low, etc., it just 
>> doesn't feel right to me. It puts four slight variations of the same 
>> function into the main namespace, instead of keeping them together in a 
>> namespace. Names like median_low merely simulates a namespace with 
>> pseudo-methods separated with underscores instead of dots, only without the 
>> advantages of a real namespace.
>>
>> (I treat variance and std dev differently, and make the sample and 
>> population forms separate top-level functions rather than methods, simply 
>> because they are so well-known from scientific calculators that it is 
>> unthinkable to me to do differently. Whenever I use numpy, I am surprised 
>> all over again that it has only a single variance function.)
>>
>>
>>
> --
> Sent from my Android phone with K-9 Mail. Please excuse my brevity.
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/dickinsm%40gmail.com
>
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Drastically improving list.sort() for lists of strings/ints

2016-09-11 Thread Mark Dickinson
> I am interested in making a non-trivial improvement to list.sort() [...]

Would your proposed new sorting algorithm be stable? The language
currently guarantees stability for `list.sort` and `sorted`.

-- 
Mark
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Drastically improving list.sort() for lists of strings/ints

2016-09-11 Thread Mark Dickinson
On Sun, Sep 11, 2016 at 7:43 PM, Elliot Gorokhovsky
 wrote:
> So I suppose the thing to do is to benchmark stable radix sort against 
> timsort and see if it's still worth it.

Agreed; it would definitely be interesting to see benchmarks for the
two-array stable sort as well as the American Flag unstable sort.
(Indeed, I think it would be hard to move the proposal forward without
such benchmarks.)

Apart from the cases already mentioned by Chris, one of the situations
you'll want to include in the benchmarks is the case of a list that's
already almost sorted (e.g., an already sorted list with a few extra
unsorted elements appended). This is a case that does arise in
practice, and that Timsort performs particularly well on.

-- 
Mark
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 64 bit units in PyLong

2017-07-05 Thread Mark Dickinson
On Mon, Jul 3, 2017 at 5:52 AM, Siyuan Ren  wrote:
> The current PyLong implementation represents arbitrary precision integers in
> units of 15 or 30 bits. I presume the purpose is to avoid overflow in
> addition , subtraction and multiplication. But compilers these days offer
> intrinsics that allow one to access the overflow flag, and to obtain the
> result of 64 bit multiplication as a 128 bit number. Or at least on x86-64,
> which is the dominant platform.  Any reason why it is not done?

Portability matters, so any use of these intrinsics would likely also
have to be accompanied by fallback code that doesn't depend on them,
as well as some buildsystem complexity to figure out whether those
intrinsics are supported or not. And then the Objects/longobject.c
would suffer in terms of simplicity and readability, so there would
have to be some clear gains to offset that. Note that the typical
Python workload does not involve thousand-digit integers: what would
matter would be performance of smaller integers, and it seems
conceivable that 64-bit limbs would speed up those operations simply
because so many more integers would become single-limb and so there
would be more opportunities to take fast paths, but there would need
to be benchmarks demonstrating that.

Oh, and you'd have to rewrite the power algorithm, which currently
depends on the size of a limb in bytes being a multiple of 5. :-)

-- 
Mark
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Re: What is __int__ still useful for?

2021-10-15 Thread Mark Dickinson
I'd propose that we relegate `__trunc__` to the same status as `__floor__`
and `__ceil__`: that is, have `__trunc__` limited to being support for
`math.trunc`, and nothing more. Right now the `int` constructor potentially
looks at all three of `__int__`, `__index__` and `__trunc__`, so the
proposal would be to remove that special role of `__trunc__` and reduce the
`int` constructor to only looking at `__int__` and `__index__`.

Obviously that's a backwards incompatible change, but a fairly mild one,
with an obvious place to insert a `DeprecationWarning` and a clear
transition path for affected code: code that relies on `int` being able to
use `__trunc__` would need to add a separate implementation of `__int__`.
(We made this change recently for the `Fraction` type in
https://bugs.python.org/issue44547.)

I opened an issue for this proposal a few weeks back:
https://bugs.python.org/issue44977

Mark




On Thu, Oct 14, 2021 at 11:50 AM Serhiy Storchaka 
wrote:

> 14.10.21 12:24, Eryk Sun пише:
> > Maybe an alternate constructor could be added -- such as
> > int.from_number() -- which would be restricted to calling __int__(),
> > __index__(), and __trunc__().
>
> See thread "More alternate constructors for builtin type" on Python-ideas:
>
> https://mail.python.org/archives/list/python-id...@python.org/thread/5JKQMIC6EUVCD7IBWMRHY7DRTTNSBOWG/
>
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/NU3774YDVCIUH44C7RZXCSSVRVYSLCUI/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/WX6246JW43A25MJJ6YRBLTN3GCQQQXZF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: What is __int__ still useful for?

2021-10-15 Thread Mark Dickinson
Meta: apologies for failing to trim the context in the previous post.

-- 
Mark
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GCOJ6ZNMTP6RSNTE3R5OKBTFKDIW3VCI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Is anyone using 15-bit PyLong digits (PYLONG_BITS_IN_DIGIT=15)?

2021-12-30 Thread Mark Dickinson
tl;dr: I'd like to deprecate and eventually remove the option to use 15-bit 
digits in the PyLong implementation. Before doing so, I'd like to find out 
whether there's anyone still using 15-bit PyLong digits, and if so, why they're 
doing so.

History: the use of 30-bit digits in PyLong was introduced for Python 3.1 and 
Python 2.7, to improve performance of int (Python 3) / long (Python 2) 
arithmetic. At that time, we retained the option to use 15-bit digits, for two 
reasons:

- (1) use of 30-bit digits required C99 features (uint64_t and friends) at a 
time when we hadn't yet committed to requiring C99
- (2) it wasn't clear whether 30-bit digits would be a performance win on 
32-bit operating systems

Twelve years later, reason (1) no longer applies, and I suspect that:

- No-one is deliberately using the 15-bit digit option.
- There are few machines where using 15-bit digits is faster than using 30-bit 
digits.

But I don't have solid data on either of these suspicions, hence this post.

Removing the 15-bit digit option would simplify the code (there's significant 
mental effort required to ensure we don't break things for 15-bit builds when 
modifying Objects/longobject.c, and 15-bit builds don't appear to be exercised 
by the buildbots), remove a hidden compatibility trap (see b.p.o. issue 35037), 
widen the applicability of the various fast paths for arithmetic operations, 
and allow for some minor fast-path small-integer optimisations based on the 
fact that we'd be able to assume that presence of *two* extra bits in the C 
integer type rather than just one. As an example of the latter: if `a` and `b` 
are PyLongs that fit in a single digit, then with 15-bit digits and a 16-bit 
`digit` and `sdigit` type, `a + b` can't currently safely (i.e., without 
undefined behaviour from overflow) be computed with the C type `sdigit`. With 
30-bit digits and a 32-bit `digit` and `sdigit` type, `a + b` is safe.

Mark


*References*

Related b.p.o. issue: https://bugs.python.org/issue45569
MinGW compatibility issue: https://bugs.python.org/issue35037
Introduction of 30-bit digits: https://bugs.python.org/issue4258
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ZICIMX5VFCX4IOFH5NUPVHCUJCQ4Q7QM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Is anyone using 15-bit PyLong digits (PYLONG_BITS_IN_DIGIT=15)?

2021-12-31 Thread Mark Dickinson
Thanks all! So to summarize:

- 15-bit digits are still very much in use, and deprecating the option
would likely be premature at this point
- the main users are old 32-bit (x86), which it's difficult to care about
too much, and new 32-bit (principally ARM microarchitectures), which we
*do* care about

So my first suspicion is just downright wrong. In particular, the
decade-old logic that chooses 15-bit digits whenever SIZEOF_VOID_P < 8 is
still in place (albeit with a recent modification for WebAssembly).

For the second suspicion, that "There are few machines where using 15-bit
digits is faster than using 30-bit digits.", we need more data.

It looks as though the next step would be to run some integer-intensive
benchmarks on 32-bit ARM, with both --enable-big-digits=15 and
--enable-big-digits=30. If those show a win (or at least, not a significant
loss) for 30-bit digits, then there's a case for at least making 30-bit
digits the default, which would be a first step towards eventually
dropping that support.

GPS: I'm not immediately seeing the ABI issue. If you're able to dig up
more information on that, I'd be interested to see it.

Mark


On Fri, Dec 31, 2021 at 3:33 AM Tim Peters  wrote:

> >> The reason for digits being a multiple of 5 bits should be revisited vs
> >> its original intent
>
> > I added that. The only intent was to make it easier to implement
> > bigint exponentiation easily ...
>
> That said, I see the comments in longintrepr.h note a stronger constraint:
>
> """
> the marshal code currently expects that PyLong_SHIFT is a multiple of 15
> """
>
> But that's doubtless also shallow.
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/LLGLC7XMTFC5JVFVP45HJ7Y7DAOQUV3I/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Is anyone using 15-bit PyLong digits (PYLONG_BITS_IN_DIGIT=15)?

2021-12-31 Thread Mark Dickinson
On Fri, Dec 31, 2021 at 12:40 PM Skip Montanaro 
wrote:

> Perhaps I missed it, but maybe an action item would be to add a
> buildbot which configures for 15-bit PyLong digits.
>

Yep, good point. I was wrong to say that  "15-bit builds don't appear to be
exercised by the buildbots": there's a 32-bit Gentoo buildbot that's
(implicitly) using 15-bit digits, and the GitHub Actions Windows/x86 build
also uses 15-bit digits. I don't think we have anything that's explicitly
using the `--enable-big-digits` option, though.

-- 
Mark
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ZIR2UF7KHYJ2W5Z4A3OS5BDRI3DS5QTM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Is anyone using 15-bit PyLong digits (PYLONG_BITS_IN_DIGIT=15)?

2022-01-02 Thread Mark Dickinson
On Sat, Jan 1, 2022 at 9:05 PM Antoine Pitrou  wrote:

> Note that ARM is merely an architecture with very diverse
> implementations having quite differing performance characteristics.  [...]
>

Understood. I'd be happy to see timings on a Raspberry Pi 3, say. I'm not
too worried about things like the RPi Pico - that seems like it would be
more of a target for MicroPython than CPython.

Wikipedia thinks, and the ARM architecture manuals seem to confirm, that
most 32-bit ARM instruction sets _do_ support the UMULL
32-bit-by-32-bit-to-64-bit multiply instruction. (From
https://en.wikipedia.org/wiki/ARM_architecture#Arithmetic_instructions:
"ARM supports 32-bit × 32-bit multiplies with either a 32-bit result or
64-bit result, though Cortex-M0 / M0+ / M1 cores don't support 64-bit
results.") Division may still be problematic.

-- 
Mark
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/F53IZRZPNAKB4DUPOVYWGMQDC4DAWLTF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Is anyone using 15-bit PyLong digits (PYLONG_BITS_IN_DIGIT=15)?

2022-01-14 Thread Mark Dickinson
On Sun, Jan 2, 2022 at 10:35 AM Mark Dickinson  wrote:

> Division may still be problematic.
>

On that note: Python divisions are somewhat crippled even on x64. Assuming
30-bit digits, the basic building block that's needed for multi-precision
division is a 64-bit-by-32-bit unsigned integer division, emitting a 32-bit
quotient (and ideally also a 32-bit remainder). And there's an x86/x64
instruction that does exactly that, namely DIVL. But without using inline
assembly, current versions of GCC and Clang apparently can't be persuaded
to emit that instruction from the longobject.c source - they'll use DIVQ (a
128-bit-by-64-bit division, albeit with the top 64 bits of the dividend set
to zero) on x64, and the __udivti3 or __udivti4 intrinsic on x86.

I was curious to find out what the potential impact of the failure to use
DIVL was, so I ran some timings. A worst-case target is division of a large
(multi-digit) integer by a single-digit integer (where "digit" means digit
in the sense of PyLong digit, not decimal digit), since that involves
multiple CPU division instructions in a fairly tight loop.

Results: on my laptop (2.7 GHz Intel Core i7-8559U, macOS 10.14.6,
non-optimised non-debug Python build), a single division of 10**1000 by 10
takes ~1018ns on the current main branch and ~722ns when forced to use the
DIVL instruction (by inserting inline assembly into the inplace_divrem1
function). IOW, forcing use of DIVL instead of DIVQ, in combination
with getting the remainder directly from the DIV instruction instead of
computing it separately, gives a 41% speedup in this particular worst case.
I'd expect the effect to be even more marked on x86, but haven't yet done
those timings.

For anyone who wants to play along, here's the implementation of the
inplace_divrem1 (in longobject.c) that I was using:

static digit
inplace_divrem1(digit *pout, digit *pin, Py_ssize_t size, digit n)
{
digit remainder = 0;

assert(n > 0 && n <= PyLong_MASK);
while (--size >= 0) {
twodigits dividend = ((twodigits)remainder << PyLong_SHIFT) | pin[size];
digit quotient, high, low;
high = (digit)(dividend >> 32);
low = (digit)dividend;
__asm__("divl %2\n"
: "=a" (quotient), "=d" (remainder)
: "r" (n), "a" (low), "d" (high)
);
pout[size] = quotient;
}
return remainder;
}


I don't know whether we *really* want to open the door to using inline
assembly for performance reasons in longobject.c, but it's interesting to
see the effect.

-- 
Mark
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ZWGPO3TMCI7WNLC3EMS26DIKI5D3ZWMK/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Is anyone using 15-bit PyLong digits (PYLONG_BITS_IN_DIGIT=15)?

2022-01-16 Thread Mark Dickinson
On Sat, Jan 15, 2022 at 8:12 PM Tim Peters  wrote:

> Something is missing here, but can't guess what without seeing the
> generated machine code.But I trust Mark will do that.
>

Welp, there goes my weekend. :-)

 $ python -m timeit -n 150 -s "x = 10**1000" "x//10"

150 loops, best of 5: 376 nsec per loop
>
> Which actually makes little sense to me. [...] Under 4 nsec per iteration
> seems

close to impossibly fast on a 3.8GHz box, given the presence of any
> division instruction.







However, dividing by 10 is not a worst case on this box. Dividing by
> 100 is over 3x slower:
>
> $ python -m timeit -n 150 -s "x = 10**1000" "x//100"
> 150 loops, best of 5: 1.25 usec per loop


Now *that* I certainly wasn't expecting. I don't see the same effect on
macOS / Clang, whether compiling with --enable-optimizations or not; this
appears to be a GCC innovation. And indeed, as Tim suggested, it turns out
that there's no division instruction present in the loop for the
division-by-10 case - we're doing division via multiplication by the
reciprocal. In Python terms, we're computing `x // 10` as `(x *
0xcccd) >> 67`. Here's the tell-tale snippet of the assembly
output from the second compilation (the one that makes use of the generated
profile information) of longobject.c at commit
09087b8519316608b85131ee7455b664c00c38d2

on
a Linux box, with GCC 11.2.0. I added a couple of comments, but it's
otherwise unaltered

.loc 1 1632 36 view .LVU12309
movl %r13d, %r11d
salq $2, %rbp
cmpl $10, %r13d # compare divisor 'n' with 10, and
jne .L2797  # go to the slow version if n != 10
leaq 1(%r10), %r9 # from here on, the divisor is 10
addq %rbp, %r8
.LVL3442:
.loc 1 1632 36 view .LVU12310
addq %rbp, %rdi
.LVL3443:
.loc 1 1632 36 view .LVU12311
.LBE8049:
.loc 1 1624 15 view .LVU12312
xorl %r13d, %r13d
.LVL3444:
.loc 1 1624 15 view .LVU12313
movabsq $-3689348814741910323, %r11 # magic constant 0xcccd for
division by 10

and then a few lines later:

.loc 1 1630 9 is_stmt 1 view .LVU12316
.loc 1 1631 9 view .LVU12317
.loc 1 1631 39 is_stmt 0 view .LVU12318
movl (%r8,%r10,4), %r14d # move top digit of divisor into the low word of
r14
.LVL3446:
.loc 1 1632 9 is_stmt 1 view .LVU12319
movq %r14, %rax # set up for division: top digit is now in rax
.loc 1 1633 13 is_stmt 0 view .LVU12320
movq %r14, %r13
mulq %r11 # here's the division by 10: multiply by the magic constant
shrq $3, %rdx # and divide by 8 (via a shift)

and then it all gets a bit repetitive and boring - there's a lot of loop
unrolling going on.

So gcc is anticipating divisions by 10 and introducing special-case
divide-by-reciprocal-multiply code for that case, and presumably the
profile generated for the PGO backs up this being a common enough case, so
we end up with the above code in the final compilation.

TIL ...

-- 
Mark
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VDII5EBMXLNO4U3BSSNWAW2ETLNG6YUN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Is anyone using 15-bit PyLong digits (PYLONG_BITS_IN_DIGIT=15)?

2022-01-16 Thread Mark Dickinson
On Sun, Jan 16, 2022 at 4:11 PM Terry Reedy  wrote:

>
>
> https://stackoverflow.com/questions/41183935/why-does-gcc-use-multiplication-by-a-strange-number-in-implementing-integer-divi
>
> and
>
>
> https://stackoverflow.com/questions/30790184/perform-integer-division-using-multiplication
>
> have multiple discussions of the technique for machine division
> invariant (small) ints and GCC's use thereof (only suppressed with -0s?).
>

Yes, it's an old and well-known technique, and compilers have been using it
for division by a known-at-compile-time constant for many decades. What's
surprising here is the use by GCC in a situation where the divisor is
*not* known
at compile time - that GCC essentially guesses that a divisor of 10 is
common enough to justify special-casing.

There's also the libdivide library[1], which caters to situations where you
have a divisor not known at compile time but you know you're going to be
using it often enough to compensate for the cost of computing the magic
multiplier dynamically at run time.

[1] https://libdivide.com

-- 
Mark
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/PPF6TOGH6QJXGKYTYVVAQC4D3D3HT7R4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Is anyone using 15-bit PyLong digits (PYLONG_BITS_IN_DIGIT=15)?

2022-01-16 Thread Mark Dickinson
On Sun, Jan 16, 2022 at 12:08 PM Mark Dickinson  wrote:

> So gcc is anticipating divisions by 10 and introducing special-case
> divide-by-reciprocal-multiply code for that case, and presumably the
> profile generated for the PGO backs up this being a common enough case, so
> we end up with the above code in the final compilation.
>

Nope, that's not what's happening. This analysis is backwards, and unfairly
attributes to GCC the apparently arbitrary choice to optimise division by
10. But it's not GCC's fault; it's ours. What's *actually* happening is
that GCC is simply recording values for n used in calls to divrem1 (via the
-fprofile-values option, which is implied by -fprofile-generate, which is
used as a result of the --enable-optimizations configure script option).
It's then noticing that in our profile task (which consists of a selection
of Lib/test/test_*.py test files) we most often do divisions by 10, and so
it optimizes that case.

To test this hypothesis I added a large number of tests for division by 17
in test_long.py, and then recompiled from scratch (again with
--enable-optimizations). Here are the results:

root@341b5fd44b23:/home/cpython# ./python -m timeit -n 100 -s
"x=10**1000; y=10" "x//y"

100 loops, best of 5: 1.14 usec per loop

root@341b5fd44b23:/home/cpython# ./python -m timeit -n 100 -s
"x=10**1000; y=17" "x//y"

100 loops, best of 5: 306 nsec per loop

root@341b5fd44b23:/home/cpython# ./python -m timeit -n 100 -s
"x=10**1000; y=1" "x//y"

100 loops, best of 5: 1.14 usec per loop

root@341b5fd44b23:/home/cpython# ./python -m timeit -n 100 -s
"x=10**1000; y=2" "x//y"

100 loops, best of 5: 1.15 usec per loop

As expected, division by 17 is now optimised; division by 10 is as slow as
division by other small scalars.

-- 
Mark
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/2MOQCVMEQBV7PATT47GUYHS42QIJHTRK/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Is anyone using 15-bit PyLong digits (PYLONG_BITS_IN_DIGIT=15)?

2022-01-16 Thread Mark Dickinson
On Sun, Jan 16, 2022 at 9:28 PM Guido van Rossum  wrote:

> Does the optimization for //10 actually help in the real world? [...]
>

Yep, I don't know. If 10 is *not* the most common small divisor in real
world code, it must at least rank in the top five. I might hazard a guess
that division by 2 would be more common, but I've no idea how one would go
about establishing that.

The reason that the divisor of 10 is turning up from the PGO isn't a
particularly convincing one - it looks as though it's a result of our
testing the builtin int-to-decimal-string conversion by comparing with an
obviously-correct repeated-division-by-10 algorithm.

Then again I'm not sure what's *lost* even if this optimization is
> pointless -- surely it doesn't slow other divisions down enough to be
> measurable.
>

Agreed. That at least is testable. I can run some timings (but not tonight).

-- 
Mark
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/OMZI5SQ2SQ7SYN4PCDKIXQQIKGXVJTO5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Should we require IEEE 754 floating-point for CPython?

2022-02-07 Thread Mark Dickinson
On Mon, Feb 7, 2022 at 5:11 PM Victor Stinner  wrote:

> I made a change to require C99  "NAN" constant [...]


There's a separate discussion topic lurking here. It's equally in need of
discussion here (IMO), but it's orthogonal to the "should we require C99"
discussion. I've changed the subject line accordingly to try to avoid
derailing that discussion.

Unlike the other things Victor mentions ("copysign", "round", etc.), the
NAN macro is not required to be present by C99. Instead, the standard says
that "NAN is defined if and only if the implementation supports quiet NaNs
for the float type" (C99 §7.12p5).

Victor is proposing in GH-31160
 to require the presence of
the NAN macro in order for CPython to build, which under C99 is equivalent
to requiring that the C float type supports quiet NaNs. That's not the same
as requiring IEEE 754 floating-point, but it's not far off - there aren't
many non-IEEE 754 floating-point formats that support NaNs. (Historically,
there are essentially none, but it seems quite likely that there will be at
least some non-IEEE 754 formats in the future that support NaNs; Google's
bfloat16 format is one example.)

So there (at least) three questions here:

- Should we require the presence of NaNs in order for CPython to build?
- Should we require IEEE 754 floating-point for CPython-the-implementation?
- Should we require IEEE 754 floating-point for Python-the-language?

For the first two, I'd much prefer either to not require NaNs, or to go the
whole way and require IEEE 754 for CPython. Requiring NaNs but not IEEE 754
feels like an awkward halfway house: in practice, it would be just as
restrictive as requiring IEEE 754, but without the benefits of making that
requirement explicit (e.g., being able to get rid of non-IEEE 754 paths in
existing code, and being able to tell users that they can reasonably expect
IEEE 754-conformant behaviour).

Note that on the current main branch there's a Py_NO_NAN macro that
builders can define to indicate that NaNs aren't supported, but the Python
build is currently broken if Py_NO_NAN is defined (see
https://bugs.python.org/issue46656). If the answer to the first question is
"No", then we need to fix the build under Py_NO_NAN. That's not a big deal
- perhaps a couple of hours of work.

-- 
Mark
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GUIR2HZHFV2TDS7GUQHAHFSA4IC3QLMZ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 682: Format Specifier for Signed Zero

2022-03-06 Thread Mark Dickinson
PEP 682 (Format Specifier for Signed Zero) has been accepted! Please see
https://discuss.python.org/t/accepting-pep-682-format-specifier-for-signed-zero/14088

Thanks to all involved,

Mark
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/2ZYYMWOT2HQ4Q3PT6RNRC5F3DI2VEGTO/
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-Dev] negative PyLong integer -> unsigned integer, TypeError or OverflowError?

2009-02-06 Thread Mark Dickinson
On Fri, Feb 6, 2009 at 9:04 PM, Lisandro Dalcin  wrote:
> At Objects/longobject.c, you should see that in almost all cases
> OverflowError is raised when a unsigned integral is requested from a
> negative PyLong. However, See this one:
> [...]
>   if (!is_signed) {
>   PyErr_SetString(PyExc_TypeError,
>   "can't convert negative long to unsigned");
>   return -1;
>   }

I agree that TypeError seems wrong here.

Please could you file a bug report at bugs.python.org?

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] negative PyLong integer -> unsigned integer, TypeError or OverflowError?

2009-02-07 Thread Mark Dickinson
On Fri, Feb 6, 2009 at 11:38 PM, Lisandro Dalcin  wrote:
> Done, http://bugs.python.org/issue5175

Thank you!

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Tracker archeology

2009-02-10 Thread Mark Dickinson
On Tue, Feb 10, 2009 at 1:23 PM, Daniel (ajax) Diniz  wrote:
> If anyone is interested in being added as nosy for any category of
> bugs, let me know and I'll do that as I scan the tracker.

Feel free to assign anything math-related (math and cmath modules,
float and complex objects) to me.

Thanks for this!

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Adding T_SIZET to structmember.h

2009-02-13 Thread Mark Dickinson
On Thu, Feb 12, 2009 at 8:42 PM, Lisandro Dalcin  wrote:
> I would like to propose the inclusion of a new T_SIZET in structmember.h
> in order to suport 'size_t' struct fields with PyMemberDef. Would such
> addition be accepted for 2.7 and 3.1?

Please open a feature request at bugs.python.org, and we'll find out!  A
working patch would probably be helpful.

(It sounds like a sensible addition to me.)

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] 30-bit PyLong digits in 3.1?

2009-02-17 Thread Mark Dickinson
A few months ago there was a discussion [1] about changing
Python's long integer type to use base 2**30 instead of base
2**15.  http://bugs.python.org/issue4258 was opened for this.

With much help from many people (but especially Antoine
and Victor), I've finally managed to put together an
essentially finished patch for this (see 30bit_longdigit14.patch
in the tracker).

I'd like to get this in for 3.1. Any objections or comments?
Is this PEP territory?

Summary of the patch:

* Apart from improved performance, the effects should be
  almost entirely invisible to users.

* By default, 30-bit digits are used only when both 32-bit
  and 64-bit integer types are available; otherwise the
  code falls back to the usual 15-bit digits.  For Unix, there's
  a configure option --enable-big-digits that overrides this
  default.  In particular, you can use --disable-big-digits
  to force 15-bit digit longs.

* There's a new structseq sys.int_info that looks like this:

>>> sys.int_info
sys.int_info(bits_per_digit=30, sizeof_digit=4)

  the sizeof_digit is mostly there to help out the sys.getsizeof
  tests in test_sys.

* Benchmarks show significant speedups (20% and more)
  for integer arithmetic
  on 64-bit systems, and lesser speedups on 32-bit systems.
  Operations with single-digit integers aren't affected much
  either way;  most of the benefit seems to be for operations
  with small multi-digit integers.

* There are more performance improvements planned (see
   the issue discussion for details);  I left them out of the
   current patch for simplicity, and because they still need
   proper testing and benchmarking.

Mark

[1] http://mail.python.org/pipermail/python-dev/2008-November/083315.html
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 30-bit PyLong digits in 3.1?

2009-02-17 Thread Mark Dickinson
On Tue, Feb 17, 2009 at 7:49 PM, "Martin v. Löwis"  wrote:
> Can you please upload it to Rietveld also?

Will do.  I'm getting a "500 Server Error" at the moment, but I'll keep trying.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 30-bit PyLong digits in 3.1?

2009-02-17 Thread Mark Dickinson
On Tue, Feb 17, 2009 at 8:42 PM, Guido van Rossum  wrote:
> Use the upload.py script (/static/upload.py) rather than the Create Issue 
> page.

Thanks.  That worked.

http://codereview.appspot.com/14105
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Attention Bazaar mirror users

2009-02-25 Thread Mark Dickinson
On Wed, Feb 25, 2009 at 2:23 PM, Barry Warsaw  wrote:
> This is now done.  Please let me know if you have any problems with the
> mirrors.

Is the cron job that's supposed to update the bzr repository still running?
I'm getting 'No revisions to pull' when I do 'bzr pull' for the py3k branch:

Macintosh-3:py3k dickinsm$ bzr pull
Using saved parent location: http://code.python.org/python/py3k/
No revisions to pull.

...which is a bit surprising, since my last 'bzr pull' was a while ago.
Do I need to update something somewhere?

I'm using bzr version 1.11 from macports.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Attention Bazaar mirror users

2009-02-27 Thread Mark Dickinson
On Fri, Feb 27, 2009 at 7:26 PM, Barry Warsaw  wrote:
> On Feb 25, 2009, at 2:03 PM, Mark Dickinson wrote:
>> Is the cron job that's supposed to update the bzr repository still
>> running?
> I think I have this fixed now.  The branch updater is running on dinsdale
> now, but I'm currently staggering it, so that every 5 minutes the 2.5, 2.6,
> trunk, py3k and 3.0 branches get updated in a round-robin.

Seems to be working for me.  Thanks!

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] speeding up PyObject_GetItem

2009-03-24 Thread Mark Dickinson
2009/3/24 Daniel Stutzbach :
> [...]
> 100 nanoseconds, py3k trunk:
> ceval -> PyObject_GetItem (object.c) -> list_subscript (listobject.c) ->
> PyNumber_AsSsize_t (object.c) -> PyLong_AsSsize_t (longobject.c)
> [more timings snipped]

Does removing the PyLong_Check call in PyLong_AsSsize_t
make any noticeable difference to these timings?

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] speeding up PyObject_GetItem

2009-03-24 Thread Mark Dickinson
On Tue, Mar 24, 2009 at 3:50 PM, Daniel Stutzbach
 wrote:
> On Tue, Mar 24, 2009 at 10:13 AM, Mark Dickinson  wrote:
>> Does removing the PyLong_Check call in PyLong_AsSsize_t
>> make any noticeable difference to these timings?
>
> Making no other changes from the trunk, removing the PyLong_Check and NULL
> check from PyLong_AsSsize_t shaves off 4 nanoseconds (or around 4% since the
> trunk is around 100 nanoseconds).

Thanks.  I'd call that a noticeable difference.  I'd be +1 on changing
this particular check to an assert and so disabling it in non-debug builds.
I'd like to bet that the majority of calls to PyLong_AsSsize_t are
internal.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pyc files, constant folding and borderline portability issues

2009-04-06 Thread Mark Dickinson
[Antoine]
> - Issue #5593: code like 1e16+2. is optimized away and its result stored 
> as
> a constant (again), but the result can vary slightly depending on the internal
> FPU precision.
[Guido]
> I would just not bother constant folding involving FP, or only if the
> values involved have an exact representation in IEEE binary FP format.

+1 for removing constant folding for floats (besides conversion
of -).  There are just too many things to worry about:
FPU rounding mode and precision, floating-point signals and flags,
effect of compiler flags, and the potential benefit seems small.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pyc files, constant folding and borderline portability issues

2009-04-06 Thread Mark Dickinson
On Mon, Apr 6, 2009 at 9:05 PM, Raymond Hettinger  wrote:
> The code for the lsum() recipe is more readable with a line like:
>
>  exp = long(mant * 2.0 ** 53)
>
> than with
>
>  exp = long(mant * 9007199254740992.0)
>
> It would be ashamed if code written like the former suddenly
> started doing the exponentation in the inner-loop or if the code
> got rewritten by hand as shown.

Well, I'd say that the obvious solution here is to compute
the constant 2.0**53 just once, somewhere outside the
inner loop.  In any case, that value would probably be better
written as 2.0**DBL_MANT_DIG (or something similar).

As Antoine reported, the constant-folding caused quite
a confusing bug report (issue #5593):  the problem (when
we eventually tracked it down) was that the folded
constant was in a .pyc file, and so wasn't updated when
the compiler flags changed.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Shorter float repr in Python 3.1?

2009-04-07 Thread Mark Dickinson
Executive summary (details and discussion points below)
=

Some time ago, Noam Raphael pointed out that for a float x,
repr(x) can often be much shorter than it currently is, without
sacrificing the property that eval(repr(x)) == x, and proposed
changing Python accordingly.  See

http://bugs.python.org/issue1580

For example, instead of the current behaviour:

Python 3.1a2+ (py3k:71353:71354, Apr  7 2009, 12:55:16)
[GCC 4.0.1 (Apple Inc. build 5490)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 0.01
0.01
>>> 0.02
0.02
>>> 0.03
0.02
>>> 0.04
0.040001
>>> 0.04 == eval(repr(0.04))
True

we'd have this:

Python 3.1a2+ (py3k-short-float-repr:71350:71352M, Apr  7 2009, )
[GCC 4.0.1 (Apple Inc. build 5490)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 0.01
0.01
>>> 0.02
0.02
>>> 0.03
0.03
>>> 0.04
0.04
>>> 0.04 == eval(repr(0.04))
True

Initial attempts to implement this encountered various
difficulties, and at some point Tim Peters pointed out
(I'm paraphrasing horribly here) that one can't have all
three of {fast, easy, correct}.

One PyCon 2009 sprint later, Eric Smith and I have
produced the py3k-short-float-repr branch, which implements
short repr of floats and also does some major cleaning
up of the current float formatting functions.
We've gone for the {fast, correct} pairing.
We'd like to get this into Python 3.1.

Any thoughts/objections/counter-proposals/...?

More details

Our solution is based on an adaptation of David Gay's
'perfect rounding' code for inclusion in Python.  To make
eval(repr(x)) roundtripping work, one needs to have
correctly rounded float -> decimal *and* decimal -> float
conversions:  Gay's code provides correctly rounded
dtoa and strtod functions for these two conversions.
His code is well-known and well-tested:  it's used as the
basis of the glibc strtod, and is also in OS X.  It's
available from

http://www.netlib.org/fp/dtoa.c

So our branch contains a new file Python/dtoa.c,
which is a cut down version of Gay's original file. (We've
removed stuff for VAX and IBM floating-point formats,
hex NaNs, hex floating-point formats, locale-aware
interpretation of the decimal separator, K&R headers,
code for correct setting of the inexact flag, and various
other bits and pieces that Python doesn't care about.)

Most of the rest of the work is in the existing file
Python/pystrtod.c.  Every float -> string or string -> float
conversion goes through a function in this file at
some point.

Gay's code also provides the opportunity to clean
up the current float formatting code, and Eric has
reworked a lot of the float formatting in the py3k-short-float-repr
branch.  This reworking should make finishing off the
implementation of things like thousands separators much
more straightforward.

One example of this:  the previous string -> float conversion
used the system strtod, which is locale-aware, so the code
had to first replace the '.' by the current locale's decimal
separator, *then* call strtod.  There was a similar dance in
the reverse direction when doing float -> string conversion.
Both these are now unnecessary.

The current code is pretty close to ready for merging
to py3k.  I've uploaded a patchset to Rietveld:

http://codereview.appspot.com/33084/show

Apart from the short float repr, and a couple of bugfixes,
all behaviour should be unchanged from before.  There
are a few exceptions:

 - format(1e200, '<') doesn't behave quite as it did
   before.  See item (3) below for details

 - repr switches to using exponential notation at
   1e16 instead of the previous 1e17.  This avoids
   a subtle issue where the 'short float repr' result
   is padded with bogus zeros.

 - a similar change applies to str, which switches
   to exponential notation at 1e11, not 1e12.  This
   fixes the following minor annoyance, which goes
   back at least as far as Python 2.5 (and probably
   much further):

   >>> x = 1e11 + 0.5
   >>> x
   1000.5
   >>> print(x)
   1000.0

That .0 seems wrong to me:  if we're going to
go to the trouble of printing extra digits (str
usually only gives 12 significant digits; here
there are 13), they should be the *right* extra digits.

Discussion points
=

(1) Any objections to including this into py3k?  If there's
controversy, then I guess we'll need a PEP.

(2) Should other Python implementations (Jython,
IronPython, etc.) be expected to use short float repr, or should
it just be considered an implementation detail of CPython?
I propose the latter, except that all implementations should
be required to satisfy eval(repr(x)) == x for finite floats x.

(3) There's a PEP 3101 line we don't know what to do with.
In py3k, we currently have:

>>> format(1e200, '<')
'1.0e+200'

but in our py3k-short-float-repr branch:

>>> format(1e200, '<')
'1e+200'

Which is correct? The py3k behaviour
comes from the

Re: [Python-Dev] slightly inconsistent set/list pop behaviour

2009-04-07 Thread Mark Dickinson
On Wed, Apr 8, 2009 at 7:13 AM, John Barham  wrote:
> If you play around a bit it becomes clear that what set.pop() returns
> is independent of the insertion order:

It might look like that, but I don't think this is
true in general (at least, with the current implementation):

>>> foo = set([1, 65537])
>>> foo.pop()
1
>>> foo = set([65537, 1])
>>> foo.pop()
65537

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Test failure on Py3k branch

2009-04-11 Thread Mark Dickinson
On Sat, Apr 11, 2009 at 11:14 AM, Chris Withers  wrote:
> Also got the following failure from a py3k checkout:
>
> test test_cmd_line failed -- Traceback (most recent call last):
>  File "/Users/chris/py3k/Lib/test/test_cmd_line.py", line 143, in
> test_run_code
>    0)
> AssertionError: 1 != 0

Are you on OS X?  This looks like

http://bugs.python.org/issue4388

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 2.6.2 final

2009-04-11 Thread Mark Dickinson
On Fri, Apr 10, 2009 at 2:31 PM, Barry Warsaw  wrote:
> bugs.python.org is apparently down right now, but I set issue 5724 to
> release blocker for 2.6.2.  This is waiting for input from Mark Dickinson,
> and it relates to test_cmath failing on Solaris 10.

I'd prefer to leave this alone for 2.6.2.  There's a fix posted to the issue
tracker, but it's not entirely trivial and I think the risk of accidental
breakage outweighs the niceness of seeing 'all tests passed' on
Solaris.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Shorter float repr in Python 3.1?

2009-04-14 Thread Mark Dickinson
On Tue, Apr 14, 2009 at 9:45 AM, Ned Deily  wrote:
>  Ned Deily  wrote:
>>  Eric Smith  wrote:
>> > Before then, if anyone could build and test the py3k-short-float-repr
>> > branch on any of the following machines, that would be great:
>> >
>> [...]
>> > Something bigendian, like a G4 Mac
>>
>> I'll crank up some OS X installer builds and run them on G3 and G4 Macs
>> vs 32-/64- Intel.  Any tests of interest beyond the default regttest.py?

Ned, many thanks for doing this!

> Then I tried a couple of random floats:
>
> Python 3.1a2+ (py3k-short-float-repr, Apr 13 2009, 20:55:35)
> [GCC 4.0.1 (Apple Inc. build 5490)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
 3.1
> -9.255965342383856e+61
 1.
> ^C
> Terminated  <-- kill needed

Cool!  I suspect endianness issues.  As evidence, I present:

>>> list(struct.pack('>> list(struct.pack('http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Shorter float repr in Python 3.1?

2009-04-14 Thread Mark Dickinson
By the way, a simple native build on OS X 10.4/PPC passed all tests (that
we're already failing before).

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Shorter float repr in Python 3.1?

2009-04-14 Thread Mark Dickinson
On Tue, Apr 14, 2009 at 11:37 AM, Mark Dickinson  wrote:
> By the way, a simple native build on OS X 10.4/PPC passed all tests (that
> we're already failing before).

s/we're/weren't
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Shorter float repr in Python 3.1?

2009-04-14 Thread Mark Dickinson
On Tue, Apr 14, 2009 at 9:45 AM, Ned Deily  wrote:
> FIrst attempt was a fat (32-bit i386 and ppc) build on 10.5 targeted for
> 10.3 and above; this is the similar to recent python.org OSX installers.

What's the proper way to create such a build?  I've been trying:

./configure --with-universal-archs=32-bit --enable-framework
--enable-universalsdk=/ MACOSX_DEPLOYMENT_TARGET=10.5

but the configure AC_C_BIGENDIAN macro doesn't seem to pick up
on the universality:  the output from ./configure contains the line:

checking whether byte ordering is bigendian... no

I was expecting a "... universal" instead of "... no".

>From reading the autoconf manual, it seems as though AC_C_BIGENDIAN
knows some magic to make things work for universal builds; it ought to be
possible to imitate that magic somehow.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Shorter float repr in Python 3.1?

2009-04-14 Thread Mark Dickinson
Okay, I think I might have fixed up the float endianness detection for
universal builds on OS X.  Ned, any chance you could give this
another try with an updated version of the py3k-short-float-repr branch?

One thing I don't understand:

Is it true that to produce a working universal/fat build of Python,
one has to first regenerate configure and pyconfig.h.in using autoconf
version >= 2.62?  If not, then I don't understand how the
AC_C_BIGENDIAN autoconf macro can be giving the right results.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Shorter float repr in Python 3.1?

2009-04-14 Thread Mark Dickinson
On Tue, Apr 14, 2009 at 5:14 PM, Antoine Pitrou  wrote:

> If this approach is sane, could it be adopted for all other instances of
> endianness detection in the py3k code base?

I think everything else is fine:  float endianness detection (for marshal,
pickle, struct) is done at runtime. Integer endianness detection goes
via AC_C_BIGENDIAN, which understands universal builds---but only
for autoconf >= 2.62.

> Has anyone tested a recent py3k using universal builds? Do all tests pass?

Do you know the right way to create a universal build?  If so, I'm in a position
to test on 32-bit PPC, 32-bit Intel and 64-bit Intel.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Shorter float repr in Python 3.1?

2009-04-14 Thread Mark Dickinson
On Tue, Apr 14, 2009 at 5:49 PM, Antoine Pitrou  wrote:
> Mark Dickinson  gmail.com> writes:
>> Do you know the right way to create a universal build?
>
> Not at all, sorry.

No problem :). I might try asking on the pythonmac-sig list.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Shorter float repr in Python 3.1?

2009-04-14 Thread Mark Dickinson
On Tue, Apr 14, 2009 at 6:55 PM, "Martin v. Löwis"  wrote:
> The outcome of AC_C_BIGENDIAN isn't used on OSX. Depending on the exact
> version you look at, things might work differently; in trunk,
> Include/pymacconfig.h should be used [...]

Many thanks---that was the missing piece of the puzzle.  I think I
understand how to make things work now.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Shorter float repr in Python 3.1?

2009-04-14 Thread Mark Dickinson
On Tue, Apr 14, 2009 at 6:32 PM, Ned Deily  wrote:
> The OSX installer script is in Mac/BuildScript/build-installer.py.
>
> For 2-way builds, it essentially does:
>
> export MACOSX_DEPLOYMENT_TARGET=10.3
> configure -C --enable-framework
>   --enable-universalsdk=/Developer/SDKs/MacOSX10.4u.sdk
>   --with-universal-archs='32-bit' --with-computed-gotos OPT='-g -O3'

Great---thank you!  And thank you for all the testing.

I'll try to sort all this out later this evening (GMT+1);  I think I
understand how to fix everything now.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python-Dev Digest, Vol 69, Issue 143

2009-04-17 Thread Mark Dickinson
On Fri, Apr 17, 2009 at 3:58 PM, Scott David Daniels
 wrote:
> Non-associativity is what makes for floating point headaches.
> To my knowledge, floating point is at least commutative.

Well, mostly. :-)

>>> from decimal import Decimal
>>> x, y = Decimal('NaN123'), Decimal('-NaN456')
>>> x + y
Decimal('NaN123')
>>> y + x
Decimal('-NaN456')

Similar effects can happen with regular IEEE 754 binary doubles,
but Python doesn't expose NaN payloads or signs, so we don't
see those effects witihin Python.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Summary of Python tracker Issues

2009-04-24 Thread Mark Dickinson
On Fri, Apr 24, 2009 at 9:25 PM, Terry Reedy  wrote:
> In going through this, I notice a lot of effort by Mark Dickenson and others

Many others, but Eric Smith's name needs to be in big lights here.
There's no way the short float repr would have been ready for 3.1 if
Eric hadn't shown an interest in this at PyCon, and then taken on
the major internal replumbing job this entailed for all of Python's
string formatting.

> 3.1.  As a certain-to-be beneficiary, I want to thank all who contributed.

Glad you like it!

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Two proposed changes to float formatting

2009-04-26 Thread Mark Dickinson
I'd like to propose two minor changes to float and complex
formatting, for 3.1.  I don't think either change should prove
particularly disruptive.

(1) Currently, '%f' formatting automatically changes to '%g' formatting for
numbers larger than 1e50.  For example:

>>> '%f' % 2**166.
'93536104789177786765035829293842113257979682750464.00'
>>> '%f' % 2**167.
'1.87072e+50'

I propose removing this feature for 3.1

More details: The current behaviour is documented (standard
library->builtin types).  (Until very recently, it was actually
misdocumented as changing at 1e25, not 1e50.)

"""For safety reasons, floating point precisions are clipped to 50; %f
conversions for numbers whose absolute value is over 1e50 are
replaced by %g conversions. [5] All other errors raise exceptions."""

There's even a footnote:

"""[5]  These numbers are fairly arbitrary. They are intended to
avoid printing endless strings of meaningless digits without
hampering correct use and without having to know the exact
precision of floating point values on a particular machine."""

I don't find this particularly convincing, though---I just don't see
a really good reason not to give the user exactly what she/he
asks for here.  I have a suspicion that at least part of the
motivation for the '%f' -> '%g' switch is that it means the
implementation can use a fixed-size buffer.  But Eric has
fixed this (in 3.1, at least) and the buffer is now dynamically
allocated, so this isn't a concern any more.

Other reasons not to switch from '%f' to '%g' in this way:

 - the change isn't gentle:  as you go over the 1e50 boundary,
   the number of significant digits produced suddenly changes
   from 56 to 6;  it would make more sense to me if it
   stayed fixed at 56 sig digits for numbers larger than 1e50.
 - now that we're using David Gay's 'perfect rounding'
   code, we can be sure that the digits aren't entirely
   meaningless, or at least that they're the 'right' meaningless
   digits.  This wasn't true before.
 - C doesn't do this, and the %f, %g, %e formats really
   owe their heritage to C.
 - float formatting is already quite complicated enough; no
   need to add to the mental complexity
 - removal simplifies the implementation :-)


On to the second proposed change:

(2) complex str and repr don't behave like float str and repr, in that
the float version always adds a trailing '.0' (unless there's an
exponent), but the complex version doesn't:

>>> 4., 10.
(4.0, 10.0)
>>> 4. + 10.j
(4+10j)

I propose changing the complex str and repr to behave like the
float version.  That is, repr(4. + 10.j) should be "(4.0 + 10.0j)"
rather than "(4+10j)".

Mostly this is just about consistency, ease of implementation,
and aesthetics.  As far as I can tell, the extra '.0' in the float
repr serves two closely-related purposes:  it makes it clear to
the human reader that the number is a float rather than an
integer, and it makes sure that e.g., eval(repr(x)) recovers a
float rather than an int.  The latter point isn't a concern for
the current complex repr, but the former is:  4+10j looks to
me more like a Gaussian integer than a complex number.

Any comments?

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Bug tracker down?

2009-04-26 Thread Mark Dickinson
The bugs.python.org site seems to be down.  ping gives me
the following (from Ireland):

Macintosh-4:py3k dickinsm$ ping bugs.python.org
PING bugs.python.org (88.198.142.26): 56 data bytes
36 bytes from et.2.16.rs3k6.rz5.hetzner.de (213.239.244.101):
Destination Host Unreachable
Vr HL TOS  Len   ID Flg  off TTL Pro  cks  Src  Dst
 4  5  00 5400 77e1   0   3a  01 603d 192.168.1.2  88.198.142.26

Various others on #python-dev have confirmed that it's not working for them.
Does anyone know what the problem is?

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bug tracker down?

2009-04-26 Thread Mark Dickinson
On Sun, Apr 26, 2009 at 4:19 PM, Aahz  wrote:
> On Sun, Apr 26, 2009, Mark Dickinson wrote:
>>
>> The bugs.python.org site seems to be down.
>
> Dunno -- forwarded to the people who can do something about it.  (There's
> a migration to a new mailserver going on, but I don't think this is
> related.)

Thanks.  Who should I contact next time, to avoid spamming python-dev?

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Two proposed changes to float formatting

2009-04-26 Thread Mark Dickinson
On Sun, Apr 26, 2009 at 5:59 PM, Eric Smith  wrote:
> Mark Dickinson wrote:
>> I propose changing the complex str and repr to behave like the
>> float version.  That is, repr(4. + 10.j) should be "(4.0 + 10.0j)"
>> rather than "(4+10j)".
>
> I'm +0.5 on this. I'd probably be +1 if I were a big complex user. Also, I'm
> not sure about the spaces around the sign. If we do want the spaces there,

Whoops.  The spaces were a mistake:  I'm not proposing to add those.
I meant "(4.0+10.0j)" rather than "(4.0 + 10.0j)".

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Two proposed changes to float formatting

2009-04-26 Thread Mark Dickinson
On Sun, Apr 26, 2009 at 8:11 PM, Scott David Daniels
 wrote:
> As a user of Idle, I would not like to see the change you seek of
> having %f stay full-precision.  When a number gets too long to print
> on a single line, the wrap depends on the current window width, and
> is calculated dynamically.  One section of the display with a 8000
> -digit (100-line) text makes Idle slow to scroll around in.  It is
> too easy for numbers to go massively positive in a bug.

I see your point.  Since we're talking about floats, thought, there
should never be more than 316 characters in a '%f' % x: the
largest float is around 1.8e308, giving 308 digits before the
point, 6 after, a decimal point, and possibly a minus sign.
(Assuming that your platform uses IEEE 754 doubles.)

> However, this is, I agree, a problem.  Since all of these numbers
> should end in a massive number of zeroes

But they typically don't end in zeros (except the six zeros following
the point),
because they're stored in binary rather than decimal.  For example:

>>> int(1e308)
11097906362944045541740492309677311846336810682903157585404911491537163328978494688899061249669721172515611590283743140088328307009198146046031271664502933027185697489699588559043338384466165001178426897626212945177628091195786707458122783970171784415105291802893207873272974885715430223118336

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Two proposed changes to float formatting

2009-04-26 Thread Mark Dickinson
On Sun, Apr 26, 2009 at 10:42 PM, Scott David Daniels
 wrote:

> I had also said (without explaining:
>> > only the trailing zeroes with the e, so we wind up with:
>> >      1157920892373161954235709850086879078532699846656405640e+23
>> >  or 115792089237316195423570985008687907853269984665640564.0e+24
>> >  or some such, rather than
>> >      1.157920892373162e+77
>> >  or 1.15792089237316195423570985008687907853269984665640564e+77
> These are all possible representations for 2 ** 256.

Understood.

> _but_ the printed decimal number I am proposing is within one ULP of
> the value of the binary numbery.

But there are plenty of ways to get this if this is what you want: if
you want a displayed result that's within 1 ulp (or 0.5 ulps, which
would be better) of the true value then repr should serve your needs.
If you want more control over the number of significant digits then
'%g' formatting gives that, together with a nice-looking output for
small numbers.

It's only '%f' formatting that I'm proposing changing: I see a
'%.2f' formatting request as a very specific, precise one: give me
exactly 2 digits after the point---no more, no less, and it seems
wrong and arbitrary that this request should be ignored for
numbers larger than 1e50 in absolute value.

That is, for general float formatting needs, use %g, str and repr.
%e and %f are for when you want fine control.

> That is, the majority of the digits
> in int(1e308) are a fiction

Not really: the float that Python stores has a very specific value,
and the '%f' formatting is showing exactly that value.  (Yes, I
know that some people advocate viewing a float as a range
of values rather than a specific value;  but I'm pretty sure that
that's not the way that the creators of IEEE 754 were thinking.)

> zeros get taken off the representation.  The reason I don't care is
> that the code from getting a floating point value is tricky, and I
> suspect the printing code might not easily be able to distinguish
> between a significant trailing zero and fictitous bits.

As of 3.1, the printing code should be fine:  it's using David
Gay's 'perfect rounding' code, so what's displayed should
be correctly rounded to the requested precision.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] One more proposed formatting change for 3.1

2009-04-28 Thread Mark Dickinson
Here's one more proposed change, this time for formatting
of floats using format() and the empty presentation type.
To avoid repeating myself, here's the text from the issue
I just opened:

http://bugs.python.org/issue5864

"""
In all versions of Python from 2.6 up, I get the following behaviour:

>>> format(123.456, '.4')
'123.5'
>>> format(1234.56, '.4')
'1235.0'
>>> format(12345.6, '.4')
'1.235e+04'

The first and third results are as I expect, but the second is somewhat
misleading: it gives 5 significant digits when only 4 were requested,
and moreover the last digit is incorrect.

I propose that Python 2.7 and Python 3.1 be changed so that the output
for the second line above is '1.235e+03'.
"""

This issue seems fairly clear cut to me, and I doubt that there's been
enough uptake of 'format' yet for this to risk significant breakage.  So
unless there are objections I'll plan to make this change before this
weekend's beta.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposed: drop unnecessary "context" pointer from PyGetSetDef

2009-05-04 Thread Mark Dickinson
On Mon, May 4, 2009 at 10:10 AM, Larry Hastings  wrote:

> So: you don't need it, it clutters up our code (particularly typeobject.c),
> and it adds overhead.  The only good reason to keep it is backwards
> compatibility, which I admit is a fine reason.

Presumably whoever added the context field had a reason for doing so.
Does anyone remember what the intended use was?

Trawling through the history, all I could find was this comment,
attached to revision 23270: [Modified Thu Sep 20 21:45:26 2001
UTC (7 years, 7 months ago) by gvanrossum]

"""
Add optional docstrings to getset descriptors.  Fortunately, there's
no backwards compatibility to worry about, so I just pushed the
'closure' struct member to the back -- it's never used in the current
code base (I may eliminate it, but that's more work because the getter
and setter signatures would have to change.)
"""

Still, binary compatibility seems like a fairly strong reason not to
remove the closure field.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposed: drop unnecessary "context" pointer from PyGetSetDef

2009-05-04 Thread Mark Dickinson
On Mon, May 4, 2009 at 8:11 PM, Daniel Stutzbach
 wrote:
> If you make the change, will 3rd party code that relies on it fail in
> unexpected ways, or will they just get a compile error?

I *think* that third party code that's recompiled for 3.1 and that
doesn't use the closure field will either just work, or will produce an
easily-fixed compile error.  Larry, does this sound right?

But I guess the bigger issue is that extensions already compiled against 3.0
that use PyGetSetDef (even if they don't make use of the closure field)
won't work with 3.1 without a recompile:  they'll segfault, or otherwise behave
unpredictably.

If that's not considered a problem, then surely we ought to be getting rid of
tp_reserved?

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposed: drop unnecessary "context" pointer from PyGetSetDef

2009-05-04 Thread Mark Dickinson
On Mon, May 4, 2009 at 9:15 PM, Antoine Pitrou  wrote:
> Mark Dickinson  gmail.com> writes:
>>
>> I *think* that third party code that's recompiled for 3.1 and that
>> doesn't use the closure field will either just work, or will produce an
>> easily-fixed compile error.  Larry, does this sound right?
>
> This doesn't sound right. The functions in the third party code will get
> compiled with the wrong signature, so they can crash (or behave unexpectedly)
> when called by Python.

Yes, of course the signature of the getters and setters changes.  Please
ignore me. :-)

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pylinting the stdlib

2009-08-02 Thread Mark Dickinson
On Sat, Aug 1, 2009 at 11:40 PM, Vincent Legoll wrote:
> Hello,
>
> I've fed parts of the stdlib to pylint and after some filtering
> there appears to be some things that looks strange, I've
> filled a few bugs to the tracker for them.
>
> 
>
> Is this useless and taking reviewer's time for nothing ?
>
> Please advise, if this is deemed useful, I'll continue further

I think this is valuable work---please do continue!

Just out of interest, how many false positives did you have
to filter out in finding the 5 cases above?

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: pruning/reorganizing branches

2009-08-04 Thread Mark Dickinson
Comments on some of the branches I've had involvement with...

On Mon, Aug 3, 2009 at 11:51 AM, Dirkjan Ochtman wrote:

> py3k-short-float-repr: strip streamed-merge

Sounds fine.

> py3k-issue1717: keep-clone

I don't think there's any need to keep this branch;  its contents were
all merged (in pieces) to py3k (various revisions with numbers in
the range 69188--69225).  So I think 'strip streamed-merge' is
appropriate here, if I'm understanding your terminology.

> trunk-math:

I think this one can go down as 'strip', too;  there's nothing there of
interest that isn't already in trunk and py3k.  It was merged to
trunk in r62380.

--
Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] random number generator state

2009-08-15 Thread Mark Dickinson
On Sat, Aug 15, 2009 at 8:54 PM, Scott David
Daniels wrote:
> [...] input to .setstate: old, new-short, and new-long.  In trying to
> get this to work, I found what might be a bug:
> code says
>  mt[0] = 0x8000UL; /* MSB is 1; assuring non-zero initial array */
> but probably should be:
>  mt[0] |= 0x8000UL; /* MSB is 1; assuring non-zero initial array */

I'm 92.3% sure that this isn't a bug.  For one thing, that line comes
directly from the authors' code[1], so if it's a bug then it's a bug in
the original code, dating from 2002;  this seems unlikely, given how
widely used and (presumably) well-scrutinized MT is.

For a more technical justification, the Mersenne Twister is based
on a linear transformation of a 19937-dimensional vector space
over F2, so its state naturally consists of 19937 bits of information,
which is 623 words plus one additional bit.  In this implementation,
that extra bit is the top bit of the first word;  the other 31 bits of that
first word shouldn't really be regarded as part of the state proper.
If you examine the genrand_int32 function in _randommodule.c,
you'll see that the low 31 bits of mt[0] play no role in updating the
state;  i.e., their value doesn't affect the new state.  So using
mt[0] |= 0x8000UL instead of mt[0] = 0x8000UL during
initialization should make no difference to the resulting stream of
random numbers (with the possible exception of the first random
number generated).

[1] http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/CODES/mt19937ar.c

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Numeric alignment issue with PEP 3101

2009-09-08 Thread Mark Dickinson
On Mon, Sep 7, 2009 at 11:10 PM, Eric Smith wrote:
> Hmm, I never noticed that. At this point, I think changing the formatting
> for any types would break code, so we should just change the documentation
> to reflect how currently works.

I think the alignment for Decimal *does* need to be changed, though.  It
currently left-aligns by default (my fault:  I just blindly followed PEP 3101
without thinking too hard about it).  I'd like to fix this for 3.2 and 2.7; I'm
not sure whether it's too disruptive to fix it in 3.1 and 2.6.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] operator precedence of __eq__, __ne__, etc, if both object have implementations

2009-09-22 Thread Mark Dickinson
On Tue, Sep 22, 2009 at 3:37 PM, Chris Withers  wrote:
> Where are the specifications on what happens if two objects are compared and 
> both have implementations of __eq__? Which __eq__ is called?
> What happens if the first one called returns False? Is the second one called? 
> What is one implements __eq__ and the other __ne__?

I (still :-) think this is covered, for Python 2.x at least, by:

http://docs.python.org/reference/datamodel.html#coercion-rules

Specifically, the bits that say:

- For objects x and y, first x.__op__(y) is tried. If this is not
implemented or returns NotImplemented, y.__rop__(x) is tried. If this
is also not implemented or returns NotImplemented, a TypeError
exception is raised. But see the following exception:

- Exception to the previous item: if the left operand is an instance
of a built-in type or a new-style class, and the right operand is an
instance of a proper subclass of that type or class and overrides the
base’s __rop__() method, the right operand’s __rop__() method is tried
before the left operand’s __op__() method.

I agree that having these rules in a section called 'Coercion rules'
is a bit confusing.

> Python 2
> http://pastebin.com/f8f19ab3
>
> Python 3
> http://pastebin.com/f55e44630

The duplicate __eq__ calls in these pastes are disturbing:  in short,
if A() and B() are new-style classes defining __eq__, it seems that
A() == B() ends up calling A.__eq__ twice *and* B.__eq__ twice, in the
order A.__eq__, B.__eq__, B.__eq__, A.__eq__.

In 3.x, slot_tp_richcompare (in typeobject.c) makes two calls to
half_richcompare;  I think the second is redundant.  The coercion
rules are already taken care of in do_richcompare (in object.c).  I
tried removing the second call to half_richcompare, and the entire
test-suite still runs without errors.

In 2.x, it's possible that this call is necessary for some bizarre
combinations of __cmp__ and __eq__;  I haven't tried to get my head
around this yet.

I'll open an issue for the duplicate __eq__ calls.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] operator precedence of __eq__, __ne__, etc, if both object have implementations

2009-09-22 Thread Mark Dickinson
On Tue, Sep 22, 2009 at 4:12 PM, Mark Dickinson  wrote:
> I'll open an issue for the duplicate __eq__ calls.

http://bugs.python.org/issue6970

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] operator precedence of __eq__, __ne__, etc, if both object have implementations

2009-09-23 Thread Mark Dickinson
On Wed, Sep 23, 2009 at 9:12 AM, Chris Withers  wrote:
> Mark Dickinson wrote:
>>
>> I (still :-) think this is covered, for Python 2.x at least, by:
>>
>> http://docs.python.org/reference/datamodel.html#coercion-rules
>
> But this isn't coercion! :-)

Agreed.  FWIW this behaviour for arithmetic operations is also mentioned in

http://docs.python.org/reference/datamodel.html#emulating-numeric-types

but then again that section doesn't include the comparison operators.

>
>> - For objects x and y, first x.__op__(y) is tried. If this is not
>> implemented or returns NotImplemented, y.__rop__(x) is tried.
>
> Also, the above is not so:
>
> Python 2.5.1
>>>> class X:
> ...   def __eq__(self,other):
> ...     print "X __eq__"
>>>> class Z: pass
> ...
>>>> Z()==X()
> X __eq__
>
> No __req__ in sight...

Okay, so combine this with the sentence under:

http://docs.python.org/reference/datamodel.html#object.__eq__

that says:

"There are no swapped-argument versions of these methods (to be used
when the left argument does not support the operation but the right
argument does); rather, __lt__() and __gt__() are each other’s
reflection, __le__() and __ge__() are each other’s reflection, and
__eq__() and __ne__() are their own reflection."

So in the earlier doc snippets, if __op__ is __eq__, then __rop__
should also be interpreted as __eq__;  similarly if __op__ is __lt__
then __rop__ is __gt__.

I'm not saying that the documentation here couldn't be improved;  just
that IMO the docs do (with a little bit of extrapolation) describe
what should happen, giving the 'official' specification that you were
after.

I don't know where/whether the behaviour for classes that define both
__cmp__ and __eq__ are documented, though, and I'm far from sure what
the rules are in that case.

One minor oddity is that for arithmetic operations like __add__(self,
other), if type(self) == type(other) then __radd__ isn't checked, on
the basis that if __add__ fails then the operation is presumably not
supported.  This makes sense, but I wonder why this doesn't apply
equally well to __eq__:  that is, in doing A() == A() for a class A,
why do we try the __eq__ method of both instances?

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] operator precedence of __eq__, __ne__, etc, if both object have implementations

2009-09-23 Thread Mark Dickinson
On Wed, Sep 23, 2009 at 4:43 AM,   wrote:
>
>    Dino> For IronPython we wrote a set of tests which go through and define
>    Dino> the various operator methods in all sorts of combinations on both
>    Dino> new-style and old-style classes as well as subclasses of those
>    Dino> classes and then do the comparisons w/ logging.
>
> It would be very nice if these complex corner cases had a set of test
> cases which could be run by all implementations (CPython, Jython,
> IronPython, PyPy, etc).  I don't know.  Maybe the CPython test suite serves
> that purpose, but it seems like it would be helpful if this sort of
> "validation suite" was maintained as a separate project all implementations
> could use and contribute to.

+1

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] operator precedence of __eq__, __ne__, etc, if both object have implementations

2009-09-23 Thread Mark Dickinson
On Wed, Sep 23, 2009 at 4:54 PM, Dino Viehland  wrote:
> We are going to start contributing tests back real soon now.  I'm not sure
> that these are the best tests to contribute as they require a version of
> Python to compare against rather than being nice and stand alone.  But I'm
> sure we have other tests which cover this as well just not as exhaustively.
> We could also possibly check in the baseline file and then CPython could
> compare it's self to previous versions but it'd probably be a pretty
> big file - so it probably shouldn't be included in the standard install
> in the tests directory.

How big is big?  For comparison, CPython's Lib/test/decimaltestdata
directory alone is already over 4Mb, so maybe size isn't an issue?

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Python build question (fixing pymath.c).

2009-09-27 Thread Mark Dickinson
Hello all,

I'm looking for advice on how to clean up an ugliness (one
that I'm at least partly responsible for) in the Python build setup.

Here's the problem:  some of the exported functions (e.g. atanh,
log1p) in the Python/pymath.c file aren't needed by the Python core
at all; they're exported solely for use in the math and cmath modules.
Even worse, these exported functions have no '_Py' or 'Py' prefix.

Since I'm currently working on adding some oft-requested
functions to the math module (gamma, lgamma, erf, ...) it seemed
like a good time to clean this up.

So I've now got a file Modules/math_support.c that contains
some functions needed by both mathmodule.c and
cmathmodule.c, as well as a couple of functions only
currently needed by the math module.  How should I incorporate
this file into the build?

One obvious solution seems to be to build an extra math_support.so
file that's then used by both mathmodule.so and cmathmodule.so.
Is this a sensible solution?  Are there better ways?

A complication to bear in mind is that some users may want to
alter Modules/Setup.dist so that either the math module and/or
the cmath module is included in the core Python executable.

Cluelessly yours,

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python build question (fixing pymath.c).

2009-09-27 Thread Mark Dickinson
On Sun, Sep 27, 2009 at 8:48 AM, Brett Cannon  wrote:
> On Sun, Sep 27, 2009 at 00:21, Mark Dickinson  wrote:
[...]
>> So I've now got a file Modules/math_support.c that contains
>> some functions needed by both mathmodule.c and
>> cmathmodule.c, as well as a couple of functions only
>> currently needed by the math module.  How should I incorporate
>> this file into the build?
>>
>> One obvious solution seems to be to build an extra math_support.so
>> file that's then used by both mathmodule.so and cmathmodule.so.
>> Is this a sensible solution?  Are there better ways?
>>
>
> Are you planning on exposing any of this outside of those two modules?

No.

> If not then I would change the name to _math.so

That makes sense.

>> A complication to bear in mind is that some users may want to
>> alter Modules/Setup.dist so that either the math module and/or
>> the cmath module is included in the core Python executable.
>
> If you are mucking with Modules.Setup.dist you better know how to
> figure out that those two modules depend on another .so.

Sure.  I'll at least add a comment pointing out the dependence, though.

Thanks,

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3144 review.

2009-09-28 Thread Mark Dickinson
On Mon, Sep 28, 2009 at 3:04 PM, Daniel Stutzbach
 wrote:
> On Mon, Sep 28, 2009 at 7:24 AM, Nick Coghlan  wrote:
>>
>> I should note that I've softened my position slightly from what I posted
>> yesterday. I could live with the following compromise:
>>
>>    >>> x = IPv4Network('192.168.1.1/24')
>>    >>> y = IPv4Network('192.168.1.0/24')
>>    >>> x == y # Equality is the part I really want to see changed
>>    True
>>    >>> x.ip
>>    IPv4Address('192.168.1.1')
>>    >>> y.ip
>>    IPv4Address('192.168.1.0')
>
> With those semantics, IPv4Network objects with distinct IP addresses (but
> the same network) could no longer be stored in a dictionary or set.  IMO, it
> is a little counter-intuitive for objects to compare equal yet have
> different properties.  I don't think this is a good compromise.

This worries me too.  It seems like a potentially dangerous half-measure.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3144 review.

2009-09-28 Thread Mark Dickinson
On Mon, Sep 28, 2009 at 3:42 PM, Dj Gilcrease  wrote:
> On Mon, Sep 28, 2009 at 8:04 AM, Daniel Stutzbach
>  wrote:
>> On Mon, Sep 28, 2009 at 7:24 AM, Nick Coghlan  wrote:
>>>
>>> I should note that I've softened my position slightly from what I posted
>>> yesterday. I could live with the following compromise:
>>>
>>>    >>> x = IPv4Network('192.168.1.1/24')
>>>    >>> y = IPv4Network('192.168.1.0/24')
>>>    >>> x == y # Equality is the part I really want to see changed
>>>    True
>>>    >>> x.ip
>>>    IPv4Address('192.168.1.1')
>>>    >>> y.ip
>>>    IPv4Address('192.168.1.0')
>>
>> With those semantics, IPv4Network objects with distinct IP addresses (but
>> the same network) could no longer be stored in a dictionary or set.  IMO, it
>> is a little counter-intuitive for objects to compare equal yet have
>> different properties.  I don't think this is a good compromise.
>
> Thats not true, the patch I submitted
> http://codereview.appspot.com/124057 still allows the networks to be
> included in a set or as a dict key
>
 net1 = IPNetwork("10.1.2.3/24")
 net2 = IPNetwork("10.1.2.0/24")
 print hash(net1) == hash(net2)
> False
 print net1 == net2
> True

In that case, your patch breaks the rather fundamental rule that
Python objects that compare equal should have equal hash.  :-)

Relying on hashes to be distinct isn't safe.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3144 review.

2009-09-30 Thread Mark Dickinson
On Wed, Sep 30, 2009 at 1:44 AM, Nick Coghlan  wrote:
> Martin v. Löwis wrote:
>>> I would say that there certainly are precedents in other areas for
>>> keeping the information about the input form around. For example,
>>> occasionally it would be handy if parsing a hex integer returned an
>>> object that was compatible with other integers but somehow kept a hint
>>> that would cause printing it to use hex by default.
>>
>> At the risk of bringing in false analogies: it seems that Python
>> typically represents values of some type in their canonical form,
>> rather than remembering the form in which they arrived in the program:
>> - integer values "forget" how many preceding zeroes they have
>> - string literals forget which of the characters had been escaped, and
>>   whether the string was single- or double-quoted
>> - floating point values forget a lot more about their literal
>>   representation (including even the literal decimal value)
>>
>> I guess a close case would be rational numbers: clearly, 3÷2 == 6÷4;
>> would a Python library still remember (and repr) the original numerator
>> and denominator?
>
> For a concrete example of an object which remembers details about its
> creation that it ignores when determining equality, we have decimal.Decimal:
>
> .>> from decimal import Decimal as d
> .>> x = d("3.0")
> .>> y = d("3.00")
> .>> x
> d("3.0")
> .>> y
> d("3.00")
> .>> repr(x) == repr(y)
> False
> .>> x.as_tuple() == y.as_tuple()
> False
> .>> x == y
> True
[snipped]

[More on the Decimal analogy below.]

Please could someone who understands the uses of IPNetwork better than
I do explain why the following wouldn't be a significant problem, if __eq__
and __hash__ were modified to disregard the .ip attribute as suggested:

>>> linus = IPv4Network('172.16.200.1/24')
>>> snoopy = IPv4Network('172.16.200.3/24')
>>> fqdn = {linus: 'linus.peanuts.net', snoopy: 'snoopy.peanuts.net'}
>>> fqdn[linus]  # expecting 'linus.peanuts.net'
'snoopy.peanuts.net'

Is this just a problem of education, teaching the users not to abuse
IPv4Network this way?  Or is this just an unlikely use of IPv4Network?
Or have I misunderstood the proposal altogether?

As for Decimal, I see that as another whole kettle of tuna:  equality for
Decimal couldn't reasonably have been done any other way---if it weren't
mandated by the standard, there would still be a very strong expectation
that == would mean numeric equality.  That is, I see the == operator
as having two distinct but mostly compatible uses in Python:  it's
used for numeric equality, *and* it's used as the equivalence relation for
determining container membership.  Mostly these two different meanings
get along fine, though they lead to some fun when trying to ensure
that x == y implies hash(x) == hash(y) for x and y two different numeric
types.

But since Decimals and floats aren't used as set elements or dict keys
that often, the fact that you can't store Decimal('1.0') and Decimal('1.00')
together in a set doesn't often get in the way.  I'd expect putting
IPv4Network objects in a set or dict to be more common.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3144 review.

2009-09-30 Thread Mark Dickinson
On Wed, Sep 30, 2009 at 10:52 AM, Paul Moore  wrote:
> 2009/9/30 Mark Dickinson :
>> Please could someone who understands the uses of IPNetwork better than
>> I do explain why the following wouldn't be a significant problem, if __eq__
>> and __hash__ were modified to disregard the .ip attribute as suggested:
>>
>>>>> linus = IPv4Network('172.16.200.1/24')
>>>>> snoopy = IPv4Network('172.16.200.3/24')
>>>>> fqdn = {linus: 'linus.peanuts.net', snoopy: 'snoopy.peanuts.net'}
>>>>> fqdn[linus]  # expecting 'linus.peanuts.net'
>> 'snoopy.peanuts.net'
>
> I certainly don't understand IPv4Network better than you :-) But that
> just looks wrong to me - linus and snoopy are hosts not networks, so
> making them IPv4Network classes seems wrong. I'd instinctively make
> them IPv4Address objects (which, I believe, would work).

Okay, so maybe this is an abuse of IPv4Network.  But I'd (mis?)understood
that the retention of the .ip attribute was precisely a convenience to allow
this sort of use.  If not, then what's it for?  I've read the PEP and almost
all of this thread, but I can't help feeling I'm still missing something.  If
someone could point out the obvious to me I'd be grateful.

I don't have any opinion on whether the ip attribute should be retained
or not; but retaining it *and* ignoring it in comparisons just seems a
bit odd.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3144 review.

2009-09-30 Thread Mark Dickinson
On Wed, Sep 30, 2009 at 12:06 PM, Nick Coghlan  wrote:
> Mark Dickinson wrote:
>> Okay, so maybe this is an abuse of IPv4Network.  But I'd (mis?)understood
>> that the retention of the .ip attribute was precisely a convenience to allow
>> this sort of use.  If not, then what's it for?  I've read the PEP and almost
>> all of this thread, but I can't help feeling I'm still missing something.  If
>> someone could point out the obvious to me I'd be grateful.
>
> You're not missing anything that I'm aware of - unlike the use case for
> accepting a denormalised network definition in the IPNetwork constructor
> (which has been made quite clear in the list discussion, even if it is
> still a bit vague in the PEP), the use case for *retaining* the host
> information on the network object hasn't been well articulated at all.
>
> The closest anyone has come to describing a use case is an indirect
> comment via Guido that leaving out the attribute would involve real code
> having to find somewhere else to stash the original address details
> (e.g. by passing around an IPAddres/IPNetwork tuple rather than just an
> IPNetwork object).

Ah, thanks---I'd missed that bit.  So the .ip attribute is mainly for
backwards compatibility with existing uses/users of ipaddr.  I guess
that makes sense, then.  In particular, if it's suggested that new code
shouldn't make use of the .ip attribute, then the list/dict membership
problems described above can't arise.

> However, while I'd still be a little happier if the .ip attribute went
> away all together and another means was found to conveniently associate
> an IPAddress and an IPNetwork, keeping it doesn't bother me anywhere
> near as much as having network equivalence defined in terms of something
> other than the network ID and the netmask.

Makes sense.

Thanks,

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] summary of transitioning from % to {} formatting

2009-10-03 Thread Mark Dickinson
On Sat, Oct 3, 2009 at 4:41 PM, Steven Bethard  wrote:
> I thought it might be useful for those who don't have time to read a
> million posts to have a summary of what's happened in the formatting
> discussion.

Definitely useful.  Thanks for the summary!

[...]

> * Add a parameter which declares the type of format string::
>    logging.Formatter(fmt="{asctime} - {name}", format=BRACES)
>  The API code would then switch between %-format and {}-format
>  based on the value of that parameter. If %-formatting is to be
>  deprecated, this could be done by first deprecating
>  format=PERCENTS and requiring format=BRACES, and then changing the
>  default to format=BRACES.

+1.

> * Create string subclasses which convert % use to .format calls::
>    __ = brace_fmt
>    logging.Formatter(fmt=__("{asctime} - {name}"))
>  The API code wouldn't have to change at all at first, as applying
>  % to brace_fmt objects would call .format() instead. If
>  %-formatting is to be deprecated, this could be done by first
>  deprecating plain strings and requiring brace_fmt strings, and
>  then allowing plain strings again but assuming they are {}-format
>  strings.

Uurgh.  This just feels... icky.  A badly-rationalized -1 from me.

> * Teach the API to accept callables as well as strings::
>    logging.Formatter(fmt="{asctime} - {name}".format)
>  The API code would just call the object with .format() style
>  arguments if a callable was given instead of a string. If
>  %-formatting is to be deprecated, this could be done by first
>  deprecating plain strings and requiring callables, and then
>  allowing plain strings again but assuming they are {}-format
>  strings

+0.5.  Seems like it could work, but the first solution feels
cleaner.

> * Create translators between %-format and {}-format::
>    assert to_braces("%(asctime)s") == "{asctime}"
>    assert to_percents("{asctime}") == "%(asctime)s"
>  these could then either be used outside of the API::
>    logging.Formatter(fmt=to_percents("{asctime} - {name}"))
>  or they could be used within the API combined with some sort of
>  heuristic for guessing whether a {}-format string or a %-format
>  string was passed in::
>    logging.Formatter(fmt="{asctime} - {name}")
>  If %-formatting is to be deprecated, the transition strategy here
>  is trivial. However, no one has yet written translators, and it is
>  not clear what heuristics should be used, e.g. should the method
>  just try %-formatting first and then {}-formatting if it fails?

I'm reserving judgement on this one until it becomes clear how
feasible it is.  Without having thought about it too hard, this sounds
potentially tricky and bug-prone.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Backport new float repr to Python 2.7?

2009-10-11 Thread Mark Dickinson
In a recent #python-dev IRC conversation, it was suggested that we
should consider backporting the new-style float repr from py3k to
trunk.  I'd like to get people's opinions on this idea.

To recap quickly, the algorithm for computing the repr of floats changed
between Python 2.x and Python 3.x (well, actually between 3.0 and 3.1,
but 3.0 is dead):

 - in Python 2.x, repr(x) computes 17 significant decimal digits, and
   then strips trailing zeros.  In other words, it's pretty much identical
   to doing '%.17g' % x.  The computation is done using the platform's
   *printf functions.

 - in Python 3.x, repr(x) returns the shortest decimal string that's
   guaranteed to evaluate back to the float x under correct rounding.
   The computation is done using David Gay's dtoa.c code, adapted
   for inclusion in Python (in file Python/dtoa.c).

There are (in my view) many benefits to the new approach.  Among
them:

 - fewer newbie complaints and questions (on c.l.p, IRC, Stack
   Overflow, etc.) about Python 'rounding incorrectly'. Whether this is a
   good thing or not is the matter of some debate (I'm tempted to
   borrow the time machine and simply say 'see the replies
   to this message'!)

 - string to float *and* float to string conversions are both guaranteed
   correctly rounded in 3.x: David Gay's code implements the conversion
   in both directions, and having correctly rounded string -> float
   conversions is essential to ensure that eval(repr(x)) recovers x exactly.

 - the repr of round(x, n) really does have at most n digits after the
   point, giving the semi-illusion that x really has been rounded exactly,
   and eliminating one of the most common user complaints about the
   round function.

 - round(x, n) agrees exactly with '{:.{}f}'.format(x, n)  (this isn't
   true in Python 2.x, and the difference is a cause of bug reports)

 - side effects like finding that float(x) rounds correctly for
   Decimal instances x.

 - the output from the new rule is more consistent: the 'strip trailing
   zeros' part of the old rule has some strange consequences:  e.g.,
   in 2.x right now (on a typical machine):

   >>> 0.02
   0.02
   >>> 0.03
   0.02

   even though neither 0.02 nor 0.03 can be exactly represented
   in binary.  3.x gives '0.02' and '0.03'.

 - repr(x) is consistent across platforms (or at least across platforms
   with IEEE 754 doubles;  in practice this seems to account for
   virtually all platforms currently running Python).

 - the float <-> string conversions are under our control, so any bugs
   found can be fixed in the Python source.  There's no shortage of
   conversion bugs in the wild, and certainly bugs have been observed in
   OS X, Linux and Windows.  (The ones I found in OS X 10.5 have
   been fixed in OS X 10.6, though.)

Possible problems:

 - breaking docstrings in third party code.  Though Eric reminded me
   that when we implemented this for 3.1, there were essentially no
   standard library test breakages resulting from the changed repr
   format.

 - some might argue that the new repr (and round) just allows users
   to remain ignorant of floating-point difficulties for longer, and that
   this is a bad thing.  I don't really buy either of these points.

 - someone has to put in the work.  As mentioned below, I'm happy
   to do this (and Eric's offered to help, without which this probably
   wouldn't be feasible at all), but it'll use cycles that I could also
   usefully be spending elsewhere.

I'm mostly neutral on the backport idea:  I'm very happy that this is
in 3.x, but don't see any great need to backport it.  But if there's
majority (+BDFL) support, I'm willing to put the work in to do the
backport.

Masochists who are still reading by this point and who want more
information about the new repr implementation can see the issue
discussion:

http://bugs.python.org/issue1580

Thoughts?

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Backport new float repr to Python 2.7?

2009-10-12 Thread Mark Dickinson
[Guido]
> I think you mean doctests? These are the primary reason I've always
> been hesitant to change this in 2.x.

Yes, sorry. I did of course mean doctests.

It occurs to me that any doctests that depend on the precise form of
repr(x) are, in a sense, already broken, since 2.x makes no guarantees
about repr(x) being consistent across platforms.  It's just an accident
that repr(x) in 2.x pretty much *is* consistent across major platforms,
so long as you steer clear of IEEE 754 oddities like subnormals, nans
and infinities.

[Glyph]
> I'd much rather have my doctests and float-repr'ing code break on
> 2.7 so I can deal with it as part of a minor-version upgrade than
> have it break on 3.x and have to deal with this at the same time
> as the unicode->str explosion.  It feels like a backport of this
> behavior would make the 2->3 transition itself a little easier.

I hadn't really thought about this aspect.  I find this quite convincing.

[Guido]
> PS. str(x) still seems to be using %.12g -- shouldn't it be made equal
> to repr() in 3.1 or 3.2? *That* I would call a bug, an oversight.

But str still has some value in py3k:  it protects users from
accumulated rounded errors produced by arithmetic operations:

Python 3.2a0 (py3k:75216:75220, Oct  3 2009, 21:38:04)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 0.1 + 0.2
0.30004
>>> 1.23 * 4.64
5.7071999
>>> str(0.1 + 0.2)
'0.3'
>>> str(1.23 * 4.64)
'5.7072'

I don't know whether this makes it worth keeping str different from repr.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Backport new float repr to Python 2.7?

2009-10-12 Thread Mark Dickinson
On Mon, Oct 12, 2009 at 7:48 PM, Guido van Rossum  wrote:
> On Mon, Oct 12, 2009 at 11:41 AM, Mark Dickinson  wrote:
>> But str still has some value in py3k:  it protects users from
>> accumulated rounded errors produced by arithmetic operations:
> [...]
>
> I know, but this is much more questionable now. [...]

Would it be out of the question to make str = repr in 3.2, do you think?

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Can 3.1 still be built without complex?

2009-10-15 Thread Mark Dickinson
[I originally sent this reply to Skip instead of to the list;  apologies.]

On Thu, Oct 15, 2009 at 12:39 PM,   wrote:
> I notice that WITHOUT_COMPLEX still appears in Python.h and several .c files
> but nowhere else in the 2.6, 2.7 or 3.1 source, most particularly not in
> configure or pyconfig.h.in.  Are builds --without-complex still supported?
> Has it been tested at any time in the recent past?

Apparently not.  :)

I just tried the following with an svn checkout of trunk (r75433), on OS X 10.6:

dickinsm$ CC='gcc -DWITHOUT_COMPLEX' ./configure && make

The build fails with:

gcc -DWITHOUT_COMPLEX -c -fno-strict-aliasing -DNDEBUG -g  -O3 -Wall
-Wstrict-prototypes  -I. -IInclude -I./Include   -DPy_BUILD_CORE -o
Python/compile.o Python/compile.c
Python/compile.c: In function ‘compiler_add_o’:
Python/compile.c:914: error: ‘Py_complex’ undeclared (first use in
this function)
Python/compile.c:914: error: (Each undeclared identifier is reported only once
Python/compile.c:914: error: for each function it appears in.)
Python/compile.c:914: error: expected ‘;’ before ‘z’
Python/compile.c:931: warning: implicit declaration of function
‘PyComplex_Check’
Python/compile.c:937: error: ‘z’ undeclared (first use in this function)
Python/compile.c:937: warning: implicit declaration of function
‘PyComplex_AsCComplex’
make: *** [Python/compile.o] Error 1

Mark

Postscript:  the above compilation failure is easily fixed.  The next
failure is:

gcc -DWITHOUT_COMPLEX  -u _PyMac_Error -o python.exe \
Modules/python.o \
libpython2.7.a -ldl
Undefined symbols:
  "_PyComplex_RealAsDouble", referenced from:
  __PyComplex_FormatAdvanced in libpython2.7.a(formatter_string.o)
  "_PyComplex_ImagAsDouble", referenced from:
  __PyComplex_FormatAdvanced in libpython2.7.a(formatter_string.o)
ld: symbol(s) not found
collect2: ld returned 1 exit status
make: *** [python.exe] Error 1
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Can 3.1 still be built without complex?

2009-10-15 Thread Mark Dickinson
On Thu, Oct 15, 2009 at 4:06 PM, Antoine Pitrou  wrote:
>  pobox.com> writes:
>>
>> I notice that WITHOUT_COMPLEX still appears in Python.h and several .c files
>> but nowhere else in the 2.6, 2.7 or 3.1 source, most particularly not in
>> configure or pyconfig.h.in.  Are builds --without-complex still supported?
>> Has it been tested at any time in the recent past?
>
> Is there any point in building without complex? Size reduction perhaps?
> If nobody uses it, we could remove that option. We have trouble staying
> compatible with lots of build options (see how --without-threads is little
> exercised).

Size reduction is the only point I can think of.

There's one respect in which complex is slightly more tightly
integrated in py3k than in trunk:  raising a negative number to a
non-integer power (e.g., (-1)**0.5) gives a complex result in py3k.

In trunk this raises ValueError, which means that the only way to
get a complex number in trunk is if you explicitly ask for one
somehow (e.g., by invoking complex, or using the cmath module,
or using imaginary literals, ...), so it makes slightly more sense
to build without the complex type there.

+1 for removing the WITHOUT_COMPLEX define from py3k.
-0 for removing it from trunk.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Can 3.1 still be built without complex?

2009-10-15 Thread Mark Dickinson
On Thu, Oct 15, 2009 at 8:17 PM, Antoine Pitrou  wrote:
 (-1)**.5
> (6.123031769111886e-17+1j)
>
> Don't we have a precision problem here? 0.5 is supposed to be represented
> exactly, isn't it?

0.5 is represented exactly, but complex.__pow__ makes no pretence of
being correctly rounded (and making it correctly rounded would likely
be prohibitively expensive in terms of code size and complexity).  It's
using something like x**y = exp(y*log(x)) behind the scenes, at least
for computing the argument of the result.

For square roots, cmath.sqrt produces better results.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] SIGCHECK() in longobject.c

2009-10-18 Thread Mark Dickinson
On Sun, Oct 18, 2009 at 9:01 PM, Antoine Pitrou  wrote:

> In Objects/longobject.c, there's the SIGCHECK() macro which periodically 
> checks
> for signals when doing long integer computations (divisions, multiplications).
> It does so by messing with the _Py_Ticker variable.
>
> It was added in 1991 under the title "Many small changes", and I suppose it 
> was
> useful back then.
>
> However, nowadays long objects are ridiculously fast, witness for example:
>
> $ ./py3k/python -m timeit -s "a=eval('3'*1+'5');b=eval('8'*6000+'7')"
> "str(a//b)"
> 1000 loops, best of 3: 1.47 msec per loop
>
> Can we remove this check, or are there people doing million-digits 
> calculations
> they want to interrupt using Control-C ?

Yes, I suspect there are.  Though you don't need millions of digits for a single
operation to take a noticeable amount of time:  try str(10**10),
for example.

Is there a benefit to removing the check?

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


  1   2   3   4   >