[Python-Dev] Documenting Python's float.__str__()

2020-01-20 Thread Karl O. Pinc
Hello,

There appears to be extremely minimal documentation on how floats are
formatted on output.  All I really see is that float.__str__() is
float.__repr__().  So that means that float->str->float does not
result in a different value.

It would be nice if the output format for float was documented, to the
extent this is possible.  #python suggested that I propose a patch,
but I see no way to write a documentation patch without having any
clue about what Python promises, whether in the CPython implementation
or as part of a specification for Python.

What are the promises Python makes about the str() of a float?  Will
it produce 1.0 today and 1.0e0 or +1.0 tomorrow?  When is the result
in exponential notation and when not?  Does any of this depend on the
underlying OS or hardware or Python implementation?  Etc.

I'm guessing that Python is consistent with an IEEE 754
"external character sequence", but don't know what the IEEE
specification says or whether python conforms.

I don't really care whether there's documentation for __str__() or
__repr__() or something else.  I'm just thinking that there should be
some way to guarantee a well defined "useful" float output formatting.
By "useful" I mean in exponential notation when non-exponential
notation is over-long.

I am writing a program that sometimes prints python floats and want to
be able to document what is printed.  Right now I can't truly
guarantee anything, other than the nan and inf and -inf
representations.  (I feel comfortable with nan and the like because I
don't see it likely that their representations will change.)  Of
course I could always re-implement Python's float.__repr__() in Python
so as to have full control, but this should be pointless.  Python's
output representation is unlikely to change and Python should be able
to make sufficient promises about its existing float representation.

I suppose there are similar issues with integers, but the varieties of
floating point number implementations and the existence of both
exponential and non-exponential representations make float
particularly problematic and representations potentially mercurial.

I also don't know if documentation changes with regard to external
representations would require a PEP.

I have found the following related information:

Use shorter float repr when possible
https://bugs.python.org/issue1580

https://github.com/python/cpython/blob/master/Python/pystrtod.c#L831

String conversion and formatting
https://docs.python.org/3/c-api/conversion.html

sys.float_repr_style
https://docs.python.org/3/library/sys.html#sys.float_repr_style

object.__str__(self)
https://docs.python.org/3/reference/datamodel.html#object.__str__

At the end of the day I don't _really_ care.  But having put thought
into the matter I care enough to write this email and ask the
question.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/FV22TKT3S2Q3P7PNN6MCXI6IX3HRRNAL/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Documenting Python's float.__str__()

2020-01-21 Thread Karl O. Pinc
On Tue, 21 Jan 2020 21:09:57 +1100
Steven D'Aprano  wrote:

> On Mon, Jan 20, 2020 at 09:59:07PM -0600, Karl O. Pinc wrote:
> 
> > It would be nice if the output format for float was documented, to
> > the extent this is possible.  
> 
> I don't think we should make any promises about the repr() of floats. 
> We've already changed the format at least twice:
> 
> - once to switch to the shortest unambiguous representation;
> - and once to shift to a more consistent output for NANs.
> 
> (NANs on Windows prior to 2.6 used to be displayed as '1.#IND', if I 
> recall correctly.)
> 
> We may never want to change output format again, but if we document a 
> certain format that will be read by people as a guarantee, and that 
> closes the door to any change without a long and tedious deprecation 
> period.

Understood.  But you still might want to document, or even define in the
language, that you're outputting the shortest unambiguous
representation.  Or other such broad principals like IEEE 754
representation compatibility.  This is a suggestion, I don't want to
advocate.

> If anyone wants a guaranteed output format for floats, they ought to
> use the various string formatting operations, which offer guaranteed 
> formatting outputs. Or build your own formatter.
> 
> I think that the most we should promise is that (with the exception
> of NANs) float -> repr -> float should round-trip with no change in
> value.

That would be nice, and is the sort of general principal I'm thinking
of.

Another one might be "a sign is only printed for negative numbers".

I guess I will advocate for _some_ specification built into Python's
definition.  Otherwise everybody should _always_ build their own
formatter; lest they wake up one morning and find that int zero prints
as "+0".

As mentioned, parts of this discussion could also apply to other
numeric types.

> > I don't really care whether there's documentation for __str__() or
> > __repr__() or something else.  I'm just thinking that there should
> > be some way to guarantee a well defined "useful" float output
> > formatting.  
> 
> https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting
> 
> https://docs.python.org/3/library/string.html#format-string-syntax

Thanks.  For some reason nobody in #python pointed me to the 'g' format
type.  That resolves my issue.

Unfortunately, because 'g' can strip the trailing ".0" floats
formatted with it no longer satisfy the float->str->float
immutability property.  I can always:

  out = f'{num:g}'
  print(out if 'e' in out or '.' in out else f'{out}.0')

sort of logic.  (With handling for INF and NAN.)
A cleaner format would be nice but this works.
(The #g format leaves multiple trailing zeros, which is
too different from the "minimal" form __repr__() produces.)

FYI.  It wouldn't hurt to have the PyOS_double_to_string() docs
https://docs.python.org/3/c-api/conversion.html point out that "format"
uses the codes as defined in your formatting links above.  Digging
around got me to PyOS_double_to_string() whereupon I was left in
the dark about the meaning of the "format" codes.

Thanks you all for the help.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/MP5OKKVGWLCCYJE7EQ2DOPXFHACGTRN4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Documenting Python's float.__str__()

2020-01-21 Thread Karl O. Pinc
On Tue, 21 Jan 2020 09:01:29 -0600
"Karl O. Pinc"  wrote:

> I guess I will advocate for _some_ specification built into Python's
> definition.  Otherwise everybody should _always_ build their own
> formatter; lest they wake up one morning and find that int zero prints
> as "+0".

Having made a suggestion I've followed up with a pull request.
https://github.com/python/cpython/pull/18111

I think I have come up with a very minimal and sane
set of restrictions on the default Numeric string
representations.  Having done that, I'm less interested
in spending a lot more time on this.

I'd be happy to explain my wording choices, and equally
happy to have the pull request immediately rejected.

The pull request is presently failing the check for
news.  (I'm not entirely clear on how to
satisfy the requirement,
or whether I could come up with a good news entry.
I'll wait to resolve this if it looks like the patch
is going anywhere.)

There should probably also be unit tests.  But again,
I'll wait to see if this is going anywhere.

FYI, it was remarkably easy to build the docs.  But the
contribution process goes through an annoying number
of corporations (github, the contributor signature...)
and login steps.

(The contributor signature needs to clear at your end.)

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/FDC772QSZB5IE7TY4DQILHWBZS2WYKKQ/
Code of Conduct: http://python.org/psf/codeofconduct/