from:"Gareth Rees"

[issue15443] datetime module has no support for nanoseconds

2021-12-18 Thread Gareth Rees



Gareth Rees  added the comment:

I also have a use case that would benefit from nanosecond resolution in 
Python's datetime objects, that is, representing and querying the results of 
clock_gettime() in a program trace.

On modern Linuxes with a vDSO, clock_gettime() does not require a system call 
and completes within a few nanoseconds. So Python's datetime objects do not 
have sufficient resolution to distinguish between adjacent calls to 
clock_gettime().

This means that, like Mark Dickinson above, I have to choose between using 
datetime for queries (which would be convenient) and accepting that nearby 
events in the trace may be indistinguishable, or implementing my own 
datetime-like data structure.

--
nosy: +g...@garethrees.org

___
Python tracker 
<https://bugs.python.org/issue15443>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue46065] re.findall takes forever and never ends

2021-12-19 Thread Gareth Rees



Gareth Rees  added the comment:

The way to avoid this behaviour is to disallow the attempts at matching that 
you know are going to fail. As Serhiy described above, if the search fails 
starting at the first character of the string, it will move forward and try 
again starting at the second character. But you know that this new attempt must 
fail, so you can force the regular expression engine to discard the attempt 
immediately.

Here's an illustration in a simpler setting, where we are looking for all 
strings of 'a' followed by 'b':

>>> import re
>>> from timeit import timeit
>>> text = 'a' * 10
>>> timeit(lambda:re.findall(r'a+b', text), number=1)
6.64353118114

We know that any successful match must be preceded by a character other than 
'a' (or the beginning of the string), so we can reject many unsuccessful 
matches like this:

>>> timeit(lambda:re.findall(r'(?:^|[^a])(a+b)', text), number=1)
0.00374348114981

In your case, a successful match must be preceded by [^a-zA-Z0-9_.+-] (or the 
beginning of the string).

--
nosy: +g...@garethrees.org

___
Python tracker 
<https://bugs.python.org/issue46065>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue46065] re.findall takes forever and never ends

2021-12-19 Thread Gareth Rees



Gareth Rees  added the comment:

This kind of question is frequently asked (#3128, #29977, #28690, #30973, 
#1737127, etc.), and so maybe it deserves an answer somewhere in the Python 
documentation.

--
resolution:  -> wont fix
stage:  -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue46065>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12514] timeit disables garbage collection if timed code raises an exception

2011-07-07 Thread Gareth Rees


New submission from Gareth Rees :

If you call timeit.timeit and the timed code raises an exception, then garbage 
collection is disabled. I have verified this in Python 2.7 and 3.2. Here's an 
interaction with Python 3.2:

Python 3.2 (r32:88445, Jul  7 2011, 15:52:49) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit, gc
>>> gc.isenabled()
True
>>> timeit.timeit('raise Exception')
Traceback (most recent call last):
  File "", line 1, in 
  File 
"/opt/local/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/timeit.py",
 line 228, in timeit
return Timer(stmt, setup, timer).timeit(number)
  File 
"/opt/local/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/timeit.py",
 line 194, in timeit
timing = self.inner(it, self.timer)
  File "", line 6, in inner
Exception
>>> gc.isenabled()
False

The problem is with the following code in Lib/timeit.py (lines 192–196):

gcold = gc.isenabled()
gc.disable()
timing = self.inner(it, self.timer)
if gcold:
gc.enable()

This should be changed to something like this:

gcold = gc.isenabled()
gc.disable()
try:
timing = self.inner(it, self.timer)
finally:
if gcold:
gc.enable()

--
components: Library (Lib)
messages: 139978
nosy: Gareth.Rees
priority: normal
severity: normal
status: open
title: timeit disables garbage collection if timed code raises an exception
type: behavior
versions: Python 2.7, Python 3.2

___
Python tracker 
<http://bugs.python.org/issue12514>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12514] timeit disables garbage collection if timed code raises an exception

2011-07-07 Thread Gareth Rees


Gareth Rees  added the comment:

Patch attached.

--
keywords: +patch
Added file: http://bugs.python.org/file22605/issue12514.patch

___
Python tracker 
<http://bugs.python.org/issue12514>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12675] tokenize module happily tokenizes code with syntax errors

2011-08-01 Thread Gareth Rees


New submission from Gareth Rees :

The tokenize module is happy to tokenize Python source code that the real 
tokenizer would reject. Pretty much any instance where tokenizer.c returns 
ERRORTOKEN will illustrate this feature. Here are some examples:

Python 3.3.0a0 (default:2d69900c0820, Aug  1 2011, 13:46:51) 
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on 
darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from tokenize import generate_tokens
>>> from io import StringIO
>>> def tokens(s):
..."""Return a string showing the tokens in the string s."""
...return '|'.join(t[1] for t in generate_tokens(StringIO(s).readline))
...
>>> # Bad exponent
>>> print(tokens('1if 2else 3'))
1|if|2|else|3|
>>> 1if 2else 3
  File "", line 1
1if 2else 3
 ^
SyntaxError: invalid token
>>> # Bad hexadecimal constant.
>>> print(tokens('0xfg'))
0xf|g|
>>> 0xfg
  File "", line 1
0xfg
   ^
SyntaxError: invalid syntax
>>> # Missing newline after continuation character.
>>> print(tokens('\\pass'))
\|pass|
>>> \pass 
  File "", line 1
\pass
^
SyntaxError: unexpected character after line continuation character

It is surprising that the tokenize module does not yield the same tokens as 
Python itself, but as this limitation only affects incorrect Python code, 
perhaps it just needs a mention in the tokenize documentation. Something along 
the lines of, "The tokenize module generates the same tokens as Python's own 
tokenizer if it is given correct Python code. However, it may incorrectly 
tokenize Python code containing syntax errors that the real tokenizer would 
reject."

--
components: Library (Lib)
messages: 141503
nosy: Gareth.Rees
priority: normal
severity: normal
status: open
title: tokenize module happily tokenizes code with syntax errors
type: behavior
versions: Python 3.3

___
Python tracker 
<http://bugs.python.org/issue12675>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12675] tokenize module happily tokenizes code with syntax errors

2011-08-01 Thread Gareth Rees


Gareth Rees  added the comment:

These errors are generated directly by the tokenizer. In tokenizer.c, the 
tokenizer generates ERRORTOKEN when it encounters something it can't tokenize. 
This causes parsetok() in parsetok.c to stop tokenizing and return an error.

--

___
Python tracker 
<http://bugs.python.org/issue12675>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12675] tokenize module happily tokenizes code with syntax errors

2011-08-04 Thread Gareth Rees


Gareth Rees  added the comment:

I'm having a look to see if I can make tokenize.py better match the real 
tokenizer, but I need some feedback on a couple of design decisions. 

First, how to handle tokenization errors? There are three possibilities:

1. Generate an ERRORTOKEN, resynchronize, and continue to tokenize from after 
the error. This is what tokenize.py currently does in the two cases where it 
detects an error.

2. Generate an ERRORTOKEN and stop tokenizing. This is what tokenizer.c does.

3. Raise an exception (IndentationError, SyntaxError, or TabError). This is 
what the user sees when the parser is invoked from pythonrun.c.

Since the documentation for tokenize.py says, "It is designed to match the 
working of the Python tokenizer exactly", I think that implementing option (2) 
is best here. (This will mean changing the behaviour of tokenize.py in the two 
cases where it currently detects an error, so that it stops tokenizing.)

Second, how to record the cause of the error? The real tokenizer records the 
cause of the error in the 'done' field of the 'tok_state" structure, but 
tokenize.py loses this information. I propose to add fields to the TokenInfo 
structure (which is a namedtuple) to record this information. The real 
tokenizer uses numeric constants from errcode.h (E_TOODEEP, E_TABSPACE, 
E_DEDENT etc), and pythonrun.c converts these to English-language error 
messages (E_TOODEEP: "too many levels of indentation"). Both of these pieces of 
information will be useful, so I propose to add two fields "error" (containing 
a string like "TOODEEP") and "errormessage" (containing the English-language 
error message).

--

___
Python tracker 
<http://bugs.python.org/issue12675>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12675] tokenize module happily tokenizes code with syntax errors

2011-08-04 Thread Gareth Rees


Gareth Rees  added the comment:

Having looked at some of the consumers of the tokenize module, I don't think my 
proposed solutions will work.

It seems to be the case that the resynchronization behaviour of tokenize.py is 
important for consumers that are using it to transform arbitrary Python source 
code (like 2to3.py). These consumers are relying on the "roundtrip" property 
that X == untokenize(tokenize(X)). So solution (1) is necessary for the 
handling of tokenization errors.

Also, that fact that TokenInfo is a 5-tuple is relied on in some places (e.g. 
lib2to3/patcomp.py line 38), so it can't be extended. And there are consumers 
(though none in the standard library) that are relying on type=ERRORTOKEN being 
the way to detect errors in a tokenization stream. So I can't overload that 
field of the structure.

Any good ideas for how to record the cause of error without breaking backwards 
compatibility?

--

___
Python tracker 
<http://bugs.python.org/issue12675>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12675] tokenize module happily tokenizes code with syntax errors

2011-08-04 Thread Gareth Rees


Gareth Rees  added the comment:

Ah ... TokenInfo is a *subclass* of namedtuple, so I can add extra properties 
to it without breaking consumers that expect it to be a 5-tuple.

--

___
Python tracker 
<http://bugs.python.org/issue12675>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12691] tokenize.untokenize is broken

2011-08-04 Thread Gareth Rees


New submission from Gareth Rees :

tokenize.untokenize is completely broken.

Python 3.2.1 (default, Jul 19 2011, 00:09:43) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tokenize, io
>>> t = list(tokenize.tokenize(io.BytesIO('1+1'.encode('utf8')).readline))
>>> tokenize.untokenize(t)
Traceback (most recent call last):
  File "", line 1, in 
  File 
"/opt/local/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/tokenize.py",
 line 250, in untokenize
out = ut.untokenize(iterable)
  File 
"/opt/local/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/tokenize.py",
 line 179, in untokenize
self.add_whitespace(start)
  File 
"/opt/local/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/tokenize.py",
 line 165, in add_whitespace
assert row <= self.prev_row
AssertionError

The assertion is simply bogus: the <= should be >=.

The reason why no-one has spotted this is that the unit tests for the tokenize 
module only ever call untokenize() in "compatibility" mode, passing in a 
2-tuple instead of a 5-tuple.

I propose to fix this, and add unit tests, at the same time as fixing other 
problems with tokenize.py (issue12675).

--
components: Library (Lib)
messages: 141634
nosy: Gareth.Rees
priority: normal
severity: normal
status: open
title: tokenize.untokenize is broken
type: behavior
versions: Python 3.2, Python 3.3

___
Python tracker 
<http://bugs.python.org/issue12691>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12691] tokenize.untokenize is broken

2011-08-04 Thread Gareth Rees


Gareth Rees  added the comment:

See my last paragraph: I propose to deliver a single patch that fixes both this 
bug and issue12675. I hope this is OK. (If you prefer, I'll try to split the 
patch in two.)

I just noticed another bug in untokenize(): in compatibility mode, if 
untokenize() is passed an iterator rather than a list, then the first token 
gets discarded:

Python 3.3.0a0 (default:c099ba0a278e, Aug  2 2011, 12:35:03) 
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on 
darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from tokenize import untokenize
>>> from token import *
>>> untokenize([(NAME, 'hello')])
'hello '
>>> untokenize(iter([(NAME, 'hello')]))
''

No-one's noticed this because the unit tests only ever pass lists to 
untokenize().

--

___
Python tracker 
<http://bugs.python.org/issue12691>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12691] tokenize.untokenize is broken

2011-08-05 Thread Gareth Rees


Gareth Rees  added the comment:

I think I can make these changes independently and issue two patches, one 
fixing the problems with untokenize listed here, and another improving tokenize.

I've just noticed a third bug in untokenize: in full mode, it doesn't handle 
backslash-continued lines correctly.

Python 3.3.0a0 (default:c099ba0a278e, Aug  2 2011, 12:35:03) 
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on 
darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from io import BytesIO
>>> from tokenize import tokenize, untokenize
>>> untokenize(tokenize(BytesIO('1 and \\\n not 
2'.encode('utf8')).readline))
b'1 andnot 2'

--

___
Python tracker 
<http://bugs.python.org/issue12691>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12675] tokenize module happily tokenizes code with syntax errors

2011-08-05 Thread Gareth Rees


Gareth Rees  added the comment:

Terry: agreed. Does anyone actually use this module? Does anyone know what the 
design goals are for tokenize? If someone can tell me, I'll do my best to make 
it meet them.

Meanwhile, here's another bug. Each character of trailing whitespace is 
tokenized as an ERRORTOKEN.

Python 3.3.0a0 (default:c099ba0a278e, Aug  2 2011, 12:35:03) 
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on 
darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from tokenize import tokenize,untokenize
>>> from io import BytesIO
>>> list(tokenize(BytesIO('1 '.encode('utf8')).readline))
[TokenInfo(type=57 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), 
line=''), TokenInfo(type=2 (NUMBER), string='1', start=(1, 0), end=(1, 1), 
line='1 '), TokenInfo(type=54 (ERRORTOKEN), string=' ', start=(1, 1), end=(1, 
2), line='1 '), TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 
0), line='')]

--

___
Python tracker 
<http://bugs.python.org/issue12675>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12691] tokenize.untokenize is broken

2011-08-05 Thread Gareth Rees


Gareth Rees  added the comment:

Please find attached a patch containing four bug fixes for untokenize():

* untokenize() now always returns a bytes object, defaulting to UTF-8 if no 
ENCODING token is found (previously it returned a string in this case).
* In compatibility mode, untokenize() successfully processes all tokens from an 
iterator (previously it discarded the first token).
* In full mode, untokenize() now returns successfully (previously it asserted).
* In full mode, untokenize() successfully processes tokens that were separated 
by a backslashed newline in the original source (previously it ran these tokens 
together).

In addition, I've added some unit tests:

* Test case for backslashed newline.
* Test case for missing ENCODING token.
* roundtrip() tests both modes of untokenize() (previously it just tested 
compatibility mode).

and updated the documentation:

* Update the docstring for untokenize to better describe its actual behaviour, 
and remove the false claim "Untokenized source will match input source 
exactly". (We can restore this claim if we ever fix tokenize/untokenize so that 
it's true.)
* Update the documentation for untokenize in tokenize.rdt to match the 
docstring.

I welcome review: this is my first proper patch to Python.

--
keywords: +patch
Added file: http://bugs.python.org/file22842/Issue12691.patch

___
Python tracker 
<http://bugs.python.org/issue12691>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12700] test_faulthandler fails on Mac OS X Lion

2011-08-05 Thread Gareth Rees


New submission from Gareth Rees :

On Mac OS 10.7, test_faulthandler fails. See test output below.

It looks as though the tests may be at fault in expecting to see 
"(?:Segmentation fault|Bus error)" instead of "(?:Segmentation fault|Bus 
error|Illegal instruction)".

test_disable (__main__.FaultHandlerTests) ... ok
test_dump_traceback (__main__.FaultHandlerTests) ... ok
test_dump_traceback_file (__main__.FaultHandlerTests) ... ok
test_dump_traceback_threads (__main__.FaultHandlerTests) ... ok
test_dump_traceback_threads_file (__main__.FaultHandlerTests) ... ok
test_dump_tracebacks_later (__main__.FaultHandlerTests) ... ok
test_dump_tracebacks_later_cancel (__main__.FaultHandlerTests) ... ok
test_dump_tracebacks_later_file (__main__.FaultHandlerTests) ... ok
test_dump_tracebacks_later_repeat (__main__.FaultHandlerTests) ... ok
test_dump_tracebacks_later_twice (__main__.FaultHandlerTests) ... ok
test_enable_file (__main__.FaultHandlerTests) ... FAIL
test_enable_single_thread (__main__.FaultHandlerTests) ... FAIL
test_fatal_error (__main__.FaultHandlerTests) ... ok
test_gil_released (__main__.FaultHandlerTests) ... FAIL
test_is_enabled (__main__.FaultHandlerTests) ... ok
test_read_null (__main__.FaultHandlerTests) ... FAIL
test_register (__main__.FaultHandlerTests) ... ok
test_register_chain (__main__.FaultHandlerTests) ... ok
test_register_file (__main__.FaultHandlerTests) ... ok
test_register_threads (__main__.FaultHandlerTests) ... ok
test_sigabrt (__main__.FaultHandlerTests) ... ok
test_sigbus (__main__.FaultHandlerTests) ... ok
test_sigfpe (__main__.FaultHandlerTests) ... ok
test_sigill (__main__.FaultHandlerTests) ... ok
test_sigsegv (__main__.FaultHandlerTests) ... ok
test_stack_overflow (__main__.FaultHandlerTests) ... ok
test_unregister (__main__.FaultHandlerTests) ... ok

==
FAIL: test_enable_file (__main__.FaultHandlerTests)
--
Traceback (most recent call last):
  File "test_faulthandler.py", line 207, in test_enable_file
filename=filename)
  File "test_faulthandler.py", line 105, in check_fatal_error
self.assertRegex(output, regex)
AssertionError: Regex didn't match: '^Fatal Python error: (?:Segmentation 
fault|Bus error)\n\nCurrent\\ thread\\ XXX:\n  File "", line 4 in 
$' not found in 'Fatal Python error: Illegal instruction\n\nCurrent 
thread XXX:\n  File "", line 4 in '

==
FAIL: test_enable_single_thread (__main__.FaultHandlerTests)
--
Traceback (most recent call last):
  File "test_faulthandler.py", line 217, in test_enable_single_thread
all_threads=False)
  File "test_faulthandler.py", line 105, in check_fatal_error
self.assertRegex(output, regex)
AssertionError: Regex didn't match: '^Fatal Python error: (?:Segmentation 
fault|Bus error)\n\nTraceback\\ \\(most\\ recent\\ call\\ first\\):\n  File 
"", line 3 in $' not found in 'Fatal Python error: Illegal 
instruction\n\nTraceback (most recent call first):\n  File "", line 3 
in '

==
FAIL: test_gil_released (__main__.FaultHandlerTests)
--
Traceback (most recent call last):
  File "test_faulthandler.py", line 195, in test_gil_released
'(?:Segmentation fault|Bus error)')
  File "test_faulthandler.py", line 105, in check_fatal_error
self.assertRegex(output, regex)
AssertionError: Regex didn't match: '^Fatal Python error: (?:Segmentation 
fault|Bus error)\n\nCurrent\\ thread\\ XXX:\n  File "", line 3 in 
$' not found in 'Fatal Python error: Illegal instruction\n\nCurrent 
thread XXX:\n  File "", line 3 in '

==
FAIL: test_read_null (__main__.FaultHandlerTests)
--
Traceback (most recent call last):
  File "test_faulthandler.py", line 115, in test_read_null
'(?:Segmentation fault|Bus error)')
  File "test_faulthandler.py", line 105, in check_fatal_error
self.assertRegex(output, regex)
AssertionError: Regex didn't match: '^Fatal Python error: (?:Segmentation 
fault|Bus error)\n\nCurrent\\ thread\\ XXX:\n  File "", line 3 in 
$' not found in 'Fatal Pyth

[issue12691] tokenize.untokenize is broken

2011-08-05 Thread Gareth Rees


Gareth Rees  added the comment:

Thanks Ezio for the review. I've made all the changes you requested, (except 
for the re-ordering of paragraphs in the documentation, which I don't want to 
do because that would lead to the "round-trip property" being mentioned before 
it's defined). Revised patch attached.

--
Added file: http://bugs.python.org/file22844/Issue12691.patch

___
Python tracker 
<http://bugs.python.org/issue12691>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12700] test_faulthandler fails on Mac OS X Lion

2011-08-08 Thread Gareth Rees


Gareth Rees  added the comment:

After changing NULL to (int *)1, all tests pass.

--

___
Python tracker 
<http://bugs.python.org/issue12700>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12700] test_faulthandler fails on Mac OS X Lion

2011-08-08 Thread Gareth Rees


Gareth Rees  added the comment:

All tests now pass.

--

___
Python tracker 
<http://bugs.python.org/issue12700>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue45476] [C API] Convert "AS" functions, like PyFloat_AS_DOUBLE(), to static inline functions

2021-10-15 Thread Gareth Rees


Gareth Rees  added the comment:

If the problem is accidental use of the result of PyFloat_AS_DOUBLE() as an 
lvalue, why not use the comma operator to ensure that the result is an rvalue?

The C99 standard says "A comma operator does not yield an lvalue" in §6.5.17; I 
imagine there is similar text in other versions of the standard.

The idea would be to define a helper macro like this:

/* As expr, but can only be used as an rvalue. */
#define Py_RVALUE(expr) ((void)0, (expr))

and then use the helper where needed, for example:

#define PyFloat_AS_DOUBLE(op) Py_RVALUE(((PyFloatObject *)(op))->ob_fval)

--
nosy: +g...@garethrees.org

___
Python tracker 
<https://bugs.python.org/issue45476>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue45643] SIGSTKFLT is missing from the signals module on Linux

2021-10-28 Thread Gareth Rees


New submission from Gareth Rees :

BACKGROUND

On Linux, "man 7 signal" includes SIGSTKFLT in its table of "various other 
signals":

Signal Value   Action  Comment
───
SIGSTKFLT  -,16,-   Term   Stack fault on coprocessor (unused)

Here "-,16,-" means that the signal is defined with the value 16 on x86 and ARM 
but not on Alpha, SPARC or MIPS. I believe that the intention was to use 
SIGSTKFLT for stack faults on the x87 math coprocessor, but this was either 
removed or never implemented, so that the signal is defined in 
/usr/include/signal.h but not used by the Linux kernel.


USE CASE

SIGSTKFLT is one of a handful of signals that are not used by the kernel, so 
that user-space programs are free to use it for their own purposes, for example 
for inter-thread or inter-process pre-emptive communication.

Accordingly, it would be nice if the name SIGSTKFLT were available in the 
Python signal module on the platforms where the signal is available, for use 
and reporting in these cases.

--
components: Library (Lib)
messages: 405174
nosy: g...@garethrees.org
priority: normal
severity: normal
status: open
title: SIGSTKFLT is missing from the signals module on Linux
type: enhancement
versions: Python 3.11

___
Python tracker 
<https://bugs.python.org/issue45643>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue45643] SIGSTKFLT is missing from the signals module on Linux

2021-10-28 Thread Gareth Rees



Change by Gareth Rees :


--
keywords: +patch
pull_requests: +27529
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/29266

___
Python tracker 
<https://bugs.python.org/issue45643>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue45643] SIGSTKFLT is missing from the signals module on Linux

2021-11-17 Thread Gareth Rees



Gareth Rees  added the comment:

Tagging vstinner as you have touched Modules/signalmodule.c a few times in the 
last year. What do you think?

--
nosy: +vstinner

___
Python tracker 
<https://bugs.python.org/issue45643>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue17005] Add a topological sort algorithm

2020-01-09 Thread Gareth Rees



Gareth Rees  added the comment:

I'd like to push back on the idea that graphs with isolated vertices are 
"unusual cases" as suggested by Raymond.

A very common use case (possibly the most common) for topological sorting is 
job scheduling. In this use case you have a collection of jobs, some of which 
have dependencies on other jobs, and you want to output a schedule according to 
which the jobs can be executed so that each job is executed after all its 
dependencies.

In this use case, any job that has no dependencies, and is not itself a 
dependency of any other job, is an isolated vertex in the dependency graph. 
This means that the proposed interface (that is, the interface taking only 
pairs of vertices) will not be suitable for this use case. Any any programmer 
who tries to use it for this use case will be setting themselves up for failure.

--

___
Python tracker 
<https://bugs.python.org/issue17005>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue40707] Popen.communicate documentation does not say how to get the return code

2020-05-21 Thread Gareth Rees



New submission from Gareth Rees :

When using subprocess.Popen.communicate(), it is natural to wonder how to get
the exit code of the subprocess. However, the documentation [1] says:

Interact with process: Send data to stdin. Read data from stdout and
stderr, until end-of-file is reached. Wait for process to terminate. The
optional input argument should be data to be sent to the child process, or
None, if no data should be sent to the child. If streams were opened in
text mode, input must be a string. Otherwise, it must be bytes.

communicate() returns a tuple (stdout_data, stderr_data). The data will be
strings if streams were opened in text mode; otherwise, bytes.

If you can guess that communicate() might set returncode, then you can find
what you need in the documentation for that attribute [2]:

The child return code, set by poll() and wait() (and indirectly by
communicate()).

I suggest that the documentation for communicate() be updated to mention that
it sets the returncode attribute. This would be consistent with poll() and
wait(), which already mention this.

[1]: 
https://docs.python.org/3/library/subprocess.html#subprocess.Popen.communicate
[2]: 
https://docs.python.org/3/library/subprocess.html#subprocess.Popen.returncode

--
assignee: docs@python
components: Documentation
messages: 369502
nosy: docs@python, g...@garethrees.org
priority: normal
severity: normal
status: open
title: Popen.communicate documentation does not say how to get the return code
type: enhancement
versions: Python 3.10, Python 3.5, Python 3.6, Python 3.7, Python 3.8, Python 
3.9

___
Python tracker 
<https://bugs.python.org/issue40707>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue40707] Popen.communicate documentation does not say how to get the return code

2020-05-21 Thread Gareth Rees



Change by Gareth Rees :


--
keywords: +patch
pull_requests: +19559
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/20283

___
Python tracker 
<https://bugs.python.org/issue40707>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue40707] Popen.communicate documentation does not say how to get the return code

2020-05-23 Thread Gareth Rees



Gareth Rees  added the comment:

The following test cases in test_subprocess.py call the communicate() method 
and then immediately assert that returncode attribute has the expected value:

* test_stdout_none
* test_stderr_redirect_with_no_stdout_redirect
* test_stdout_filedes_of_stdout
* test_communicate_stdin
* test_universal_newlines_communicate_stdin
* test_universal_newlines_communicate_input_none
* test_universal_newlines_communicate_stdin_stdout_stderr
* test_nonexisting_with_pipes
* test_wait_when_sigchild_ignored
* test_startupinfo_copy
* test_close_fds_with_stdio
* test_communicate_stdin

You'll see that some of these test for success (returncode == 0) and some for 
failure (returncode == 1). This seems like adequate test coverage to me, but if 
something is missing, let me know.

--

___
Python tracker 
<https://bugs.python.org/issue40707>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue40707] Popen.communicate documentation does not say how to get the return code

2020-06-23 Thread Gareth Rees



Gareth Rees  added the comment:

Is there anything I can do to move this forward?

--

___
Python tracker 
<https://bugs.python.org/issue40707>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue41092] Report actual size from 'os.path.getsize'

2020-06-26 Thread Gareth Rees



Gareth Rees  added the comment:

The proposed change adds a Boolean flag to os.path.getsize() so that it returns:

os.stat(filename).st_blocks * 512

(where the 512 is the file system block size on Linux; some work is needed to 
make this portable to other operating systems).

The Boolean argument here would always be constant in practice -- that is, 
you'd always call it like this:

virtual_size = os.path.getsize(filename, apparent=True)
allocated_size = os.path.getsize(filename, apparent=False)

and never like this:

x_size = os.path.getsize(filename, apparent=x)

where x varies at runtime.

The "no constant bool arguments" design principle [1] suggests that this should 
be added as a new function, something like os.path.getallocatedsize().

  [1] https://mail.python.org/pipermail/python-ideas/2016-May/040181.html

--
nosy: +g...@garethrees.org

___
Python tracker 
<https://bugs.python.org/issue41092>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue14376] sys.exit documents argument as "integer" but actually requires "subtype of int"

2012-03-20 Thread Gareth Rees


New submission from Gareth Rees :

The documentation for sys.exit says, "The optional argument arg can be an 
integer giving the exit status (defaulting to zero), or another type of object".

However, the arguments that are treated as exit statuses are actually "subtypes 
of int".

So, a bool argument is fine:

$ python2.7 -c "import sys; sys.exit(False)"; echo $?
0

But a long argument is not:

$ python2.7 -c "import sys; sys.exit(long(0))"; echo $?
0
1

The latter behaviour can be surprising since functions like os.spawnv may 
return the exit status of the executed process as a long on some platforms, so 
that if you try to pass on the exit code via

code = os.spawnv(...)
sys.exit(code)

you may get a mysterious surprise: code is 0 but exit code is 1.

It would be simple to change line 1112 of pythonrun.c from

if (PyInt_Check(value))

to

if (PyInt_Check(value) || PyLong_Check(value))

(This issue is not present in Python 3 because there is no longer a distinction 
between int and long.)

--
components: Library (Lib)
messages: 156470
nosy: Gareth.Rees
priority: normal
severity: normal
status: open
title: sys.exit documents argument as "integer" but actually requires "subtype 
of int"
type: behavior
versions: Python 2.6, Python 2.7

___
Python tracker 
<http://bugs.python.org/issue14376>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue14376] sys.exit documents argument as "integer" but actually requires "subtype of int"

2012-03-21 Thread Gareth Rees


Gareth Rees  added the comment:

> Wouldn't you also have to deal with possible errors from the PyInt_AsLong 
> call?

Good point. But I note that Python 3 just does

exitcode = (int)PyLong_AsLong(value);

so maybe it's not important to do error handling here.

--

___
Python tracker 
<http://bugs.python.org/issue14376>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue30919] Shared Array Memory Allocation Regression

2017-07-13 Thread Gareth Rees


Gareth Rees added the comment:

In Python 2.7, multiprocessing.heap.Arena uses an anonymous memory mapping on 
Unix. Anonymous memory mappings can be shared between processes but only via 
fork().

But Python 3 supports other ways of starting subprocesses (see issue 8713 [1]) 
and so an anonymous memory mapping no longer works. So instead a temporary file 
is created, filled with zeros to the given size, and mapped into memory (see 
changeset 3b82e0d83bf9 [2]). It is the zero-filling of the temporary file that 
takes the time, because this forces the operating system to allocate space on 
the disk.

But why not use ftruncate() (instead of write()) to quickly create a file with 
holes? POSIX says [3], "If the file size is increased, the extended area shall 
appear as if it were zero-filled" which would seem to satisfy the requirement.

[1] https://bugs.python.org/issue8713
[2] https://hg.python.org/cpython/rev/3b82e0d83bf9
[3] http://pubs.opengroup.org/onlinepubs/9699919799/functions/ftruncate.html

--
nosy: +g...@garethrees.org

___
Python tracker 
<http://bugs.python.org/issue30919>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue30919] Shared Array Memory Allocation Regression

2017-07-13 Thread Gareth Rees


Gareth Rees added the comment:

Note that some filesystems (e.g. HFS+) don't support sparse files, so creating 
a large Arena will still be slow on these filesystems even if the file is 
created using ftruncate().

(This could be fixed, for the "fork" start method only, by using anonymous maps 
in that case.)

--

___
Python tracker 
<http://bugs.python.org/issue30919>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue30919] Shared Array Memory Allocation Regression

2017-07-14 Thread Gareth Rees


Gareth Rees added the comment:

If you need the 2.7 behaviour (anonymous mappings) in 3.5 then you can still do 
it, with some effort. I think the approach that requires the smallest amount of 
work would be to ensure that subprocesses are started using fork(), by calling 
multiprocessing.set_start_method('fork'), and then monkey-patch 
multiprocessing.heap.Arena.__init__ so that it creates anonymous mappings using 
mmap.mmap(-1, size).

(I suggested above that Python could be modified to create anonymous mappings 
in the 'fork' case, but now that I look at the code in detail, I see that it 
would be tricky, because the Arena class has no idea about the Context in which 
it is going to be used -- at the moment you can create one shared object and 
then pass it to subprocesses under different Contexts, so the shared objects 
have to support the lowest common denominator.)

--

___
Python tracker 
<http://bugs.python.org/issue30919>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue30919] Shared Array Memory Allocation Regression

2017-07-14 Thread Gareth Rees


Gareth Rees added the comment:

Nonetheless this is bound to be a nasty performance for many people doing big 
data processing with NumPy/SciPy/Pandas and multiprocessing and moving from 2 
to 3, so even if it can't be fixed, the documentation ought to warn about the 
problem and explain how to work around it.

--

___
Python tracker 
<http://bugs.python.org/issue30919>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue30919] Shared Array Memory Allocation Regression

2017-07-14 Thread Gareth Rees


Gareth Rees added the comment:

I see now that the default start method is 'fork' (except on Windows), so 
calling set_start_method is unnecessary.

Note that you don't have to edit multiprocessing/heap.py, you can 
"monkey-patch" it in the program that needs the anonymous mapping:

from multiprocessing.heap import Arena

def anonymous_arena_init(self, size, fd=-1):
"Create Arena using an anonymous memory mapping."
self.size = size
self.fd = fd  # still kept but is not used !
self.buffer = mmap.mmap(-1, self.size)

Arena.__init__ = anonymous_arena_init

As for what it will break — any code that uses the 'spawn' or 'forkserver' 
start methods.

--

___
Python tracker 
<http://bugs.python.org/issue30919>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue30919] Shared Array Memory Allocation Regression

2017-07-14 Thread Gareth Rees


Gareth Rees added the comment:

I propose:

1. Ask Richard Oudkerk why in changeset 3b82e0d83bf9 the temporary file is 
zero-filled and not truncated. Perhaps there's some file system where this is 
necessary? (I tested HFS+ which doesn't support sparse files, and zero-filling 
seems not to be necessary, but maybe there's some other file system where it 
is?)

2. If there's no good reason for zero-filling the temporary file, replace it 
with a call to os.ftruncate(fd, size).

3. Update the documentation to mention the performance issue when porting 
multiprocessing code from 2 to 3. Unfortunately, I don't think there's any 
advice that the documentation can give that will help work around it -- 
monkey-patching works but is not supported.

4. Consider writing a fix, or at least a supported workaround. Here's a 
suggestion: update multiprocessing.sharedctypes and multiprocessing.heap so 
that they use anonymous maps in the 'fork' context. The idea is to update the 
RawArray and RawValue functions so that they take the context, and then pass 
the context down to _new_value, BufferWrapper.__init__ and thence to 
Heap.malloc where it can be used to determine what kind of Arena (file-backed 
or anonymous) should be used to satisfy the allocation request. The Heap class 
would have to have to segregate its blocks according to what kind of Arena they 
come from.

--

___
Python tracker 
<http://bugs.python.org/issue30919>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue19896] Exposing "q" and "Q" to multiprocessing.sharedctypes

2017-07-14 Thread Gareth Rees


Gareth Rees added the comment:

Patch looks good to me. The test cases are not very systematic (why only int, 
double, and long long?), but that's not the fault of the patch and shouldn't 
prevent its being applied.

--
nosy: +g...@garethrees.org

___
Python tracker 
<http://bugs.python.org/issue19896>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue30943] printf-style Bytes Formatting sometimes do not worked.

2017-07-17 Thread Gareth Rees


Gareth Rees added the comment:

Test case minimization:

Python 3.6.1 (default, Apr 24 2017, 06:18:27) 
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> b'a\x00%(a)s' % {b'a': b'a'}
b'a\x00%(a)s'

It seems that all formatting operations after a zero byte are ignored. This is 
because the code for parsing the format string (in _PyBytes_FormatEx in 
Objects/bytesobject.c) uses the following approach to find the next % character:

while (--fmtcnt >= 0) {
if (*fmt != '%') {
Py_ssize_t len;
char *pos;
pos = strchr(fmt + 1, '%');

But strchr uses the C notion of strings, which are terminated by a zero byte.

--
nosy: +g...@garethrees.org

___
Python tracker 
<http://bugs.python.org/issue30943>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue30943] printf-style Bytes Formatting sometimes do not worked.

2017-07-17 Thread Gareth Rees


Gareth Rees added the comment:

This was already noted in issue29714 and fixed by Xiang Zhang in commit 
b76ad5121e2.

--

___
Python tracker 
<http://bugs.python.org/issue30943>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue19896] Exposing "q" and "Q" to multiprocessing.sharedctypes

2017-07-17 Thread Gareth Rees


Changes by Gareth Rees :


--
nosy: +benjamin.peterson

___
Python tracker 
<http://bugs.python.org/issue19896>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue19896] Exposing "q" and "Q" to multiprocessing.sharedctypes

2017-07-17 Thread Gareth Rees


Gareth Rees added the comment:

Has Antony Lee has made a copyright assignment?

--

___
Python tracker 
<http://bugs.python.org/issue19896>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue19896] Exposing "q" and "Q" to multiprocessing.sharedctypes

2017-07-17 Thread Gareth Rees


Gareth Rees added the comment:

(If he hasn't, I don't think I can make a PR because I read his patch and so 
any implementation I make now is based on his patch and so potentially 
infringes his copyright.)

--

___
Python tracker 
<http://bugs.python.org/issue19896>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue19896] Exposing "q" and "Q" to multiprocessing.sharedctypes

2017-07-17 Thread Gareth Rees


Changes by Gareth Rees :


--
pull_requests: +2801

___
Python tracker 
<http://bugs.python.org/issue19896>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue30973] Regular expression "hangs" interpreter

2017-07-20 Thread Gareth Rees


Gareth Rees added the comment:

This is the usual exponential backtracking behaviour of Python's regex engine. 
The problem is that the regex

(?:[^*]+|\*[^/])*

can match against a string in exponentially many ways, and Python's regex 
engine tries all of them before giving up.

--
nosy: +g...@garethrees.org

___
Python tracker 
<http://bugs.python.org/issue30973>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue30976] multiprocessing.Process.is_alive can show True for dead processes

2017-07-20 Thread Gareth Rees


Gareth Rees added the comment:

This is a race condition — when os.kill returns, that means that the signal has 
been delivered, but it does not mean that the subprocess has exited yet. You 
can see this by inserting a sleep after the kill and before the liveness check:

print(proc.is_alive())
os.kill(proc.pid, signal.SIGTERM)
time.sleep(1)
print(proc.is_alive())

This (probably) gives the process time to exit. (Presumably the 
psutil.pid_exists() call has a similar effect.) Of course, waiting for 1 second 
(or any amount of time) might not be enough. The right thing to do is to join 
the process. Then when the join exits you know it died.

--
nosy: +g...@garethrees.org

___
Python tracker 
<http://bugs.python.org/issue30976>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue24869] shlex lineno inaccurate with certain inputs

2017-07-21 Thread Gareth Rees


Changes by Gareth Rees :


--
pull_requests: +2849

___
Python tracker 
<http://bugs.python.org/issue24869>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue24869] shlex lineno inaccurate with certain inputs

2017-07-21 Thread Gareth Rees


Gareth Rees added the comment:

I've made a pull request. (Not because I expect it to be merged as-is, but to 
provide a starting point for discussion.)

--
nosy: +petri.lehtinen, vinay.sajip

___
Python tracker 
<http://bugs.python.org/issue24869>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue17005] Add a topological sort algorithm

2019-01-18 Thread Gareth Rees



Gareth Rees  added the comment:

I approve in general with the principle of including a topological sort 
algorithm in the standard library. However, I have three problems with the 
approach in PR 11583:

1. The name "topsort" is most naturally parsed as "top sort" which could be 
misinterpreted (as a sort that puts items on top in some way). If the name must 
be abbreviated then "toposort" would be better.

2. "Topological sort" is a terrible name: the analogy with topological graph 
theory is (i) unlikely to be helpful to anyone; and (ii) not quite right. I 
know that the name is widely used in computing, but a name incorporating 
"linearize" or "linear order" or "total order" would be much clearer.

3. The proposed interface is not suitable for all cases! The function topsort 
takes a list of directed edges and returns a linear order on the vertices in 
those edges (if any linear order exists). But this means that if there are any 
isolated vertices (that is, vertices with no edges) in the dependency graph, 
then there is no way of passing those vertices to the function. This means that 
(i) it is inconvenient to use the proposed interface because you have to find 
the isolated vertices in your graph and add them to the linear order after 
calling the function; (ii) it is a bug magnet because many programmers will 
omit this step, meaning that their code will unexpectedly fail when their graph 
has an isolated vertex. The interface needs to be redesigned to take the graph 
in some other representation.

--
nosy: +g...@garethrees.org

___
Python tracker 
<https://bugs.python.org/issue17005>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue17005] Add a topological sort algorithm

2019-01-18 Thread Gareth Rees



Gareth Rees  added the comment:

Just to elaborate on what I mean by "bug magnet". (I'm sure Pablo understands 
this, but there may be other readers who would like to see it spelled out.)

Suppose that you have a directed graph represented as a mapping from a vertex 
to an iterable of its out-neighbours. Then the "obvious" way to get a total 
order on the vertices in the graph would be to generate the edges and pass them 
to topsort:

def edges(graph):
return ((v, w) for v, ww in graph.items() for w in ww)
order = topsort(edges(graph))

This will appear to work fine if it is never tested with a graph that has 
isolated vertices (which would be an all too easy omission).

To handle isolated vertices you have to remember to write something like this:

reversed_graph = {v: [] for v in graph}
for v, ww in graph.items():
for w in ww:
reversed_graph[w].append(v)
order = topsort(edges(graph)) + [
  v for v, ww in graph.items() if not ww and not reversed_graph[v]]

I think it likely that beginner programmers will forget to do this and be 
surprised later on when their total order is missing some of the vertices.

--

___
Python tracker 
<https://bugs.python.org/issue17005>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue29977] re.sub stalls forever on an unmatched non-greedy case

2017-04-04 Thread Gareth Rees


Gareth Rees added the comment:

The problem here is that both "." and "\s" match a whitespace character, and 
because you have the re.DOTALL flag turned on this includes "\n", and so the 
number of different ways in which (.|\s)* can be matched against a string is 
exponential in the number of whitespace characters in the string.

It is best to design your regular expression so as to limit the number of 
different ways it can match. Here I recommend the expression:

/\*(?:[^*]|\*[^/])*\*/

which can match in only one way.

--
nosy: +g...@garethrees.org

___
Python tracker 
<http://bugs.python.org/issue29977>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue29977] re.sub stalls forever on an unmatched non-greedy case

2017-04-04 Thread Gareth Rees


Gareth Rees added the comment:

See also issue28690, issue212521, issue753711, issue1515829, etc.

--

___
Python tracker 
<http://bugs.python.org/issue29977>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue30564] Base64 decoding gives incorrect outputs.

2017-06-04 Thread Gareth Rees


Gareth Rees added the comment:

RFC 4648 section 3.5 says:

   The padding step in base 64 and base 32 encoding can, if improperly
   implemented, lead to non-significant alterations of the encoded data.
   For example, if the input is only one octet for a base 64 encoding,
   then all six bits of the first symbol are used, but only the first
   two bits of the next symbol are used.  These pad bits MUST be set to
   zero by conforming encoders, which is described in the descriptions
   on padding below.  If this property do not hold, there is no
   canonical representation of base-encoded data, and multiple base-
   encoded strings can be decoded to the same binary data.  If this
   property (and others discussed in this document) holds, a canonical
   encoding is guaranteed.

   In some environments, the alteration is critical and therefore
   decoders MAY chose to reject an encoding if the pad bits have not
   been set to zero.

If decoders may choose to reject non-canonical encodings, then they may also
choose to accept them. (That's the meaning of "MAY" in RFC 2119.) So I think
Python's behaviour is conforming to the standard.

--
nosy: +g...@garethrees.org

___
Python tracker 
<http://bugs.python.org/issue30564>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue28647] python --help: -u is misdocumented as binary mode

2017-10-11 Thread Gareth Rees


Gareth Rees  added the comment:

You're welcome.

--

___
Python tracker 
<https://bugs.python.org/issue28647>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue31895] Native hijri calendar support

2017-10-30 Thread Gareth Rees


Gareth Rees  added the comment:

It is a substantial undertaking, requiring a great deal of expertise, to 
implement the Islamic calendar. The difficulty is that there are multiple 
versions of the calendar. In some places the calendar is based on human 
observation of the new moon, and so a database of past observations is needed 
(and future dates can't be represented). In other places the time of 
observability of the new moon is calculated according to an astronomical 
ephemeris (and different ephemerides are used in different places and at 
different times).

--
nosy: +g...@garethrees.org

___
Python tracker 
<https://bugs.python.org/issue31895>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue31895] Native hijri calendar support

2017-10-30 Thread Gareth Rees


Gareth Rees  added the comment:

convertdate does not document which version of the Islamic calendar it uses, 
but looking at the source code, it seems that it uses a rule-based calendar 
which has a 30-year cycle with 11 leap years. This won't help Haneef, who wants 
the Umm al-Qura calendar.

--

___
Python tracker 
<https://bugs.python.org/issue31895>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue32194] When creating list of dictionaries and updating datetime objects one by one, all values are set to last one of the list.

2017-12-01 Thread Gareth Rees


Gareth Rees  added the comment:

The behaviour of the * operator (and the associated gotcha) is documented under 
"Common sequence operations" [1]:

Note that items in the sequence s are not copied; they are referenced
multiple times. This often haunts new Python programmers ...

There is also an entry in the FAQ [2]:

replicating a list with * doesn’t create copies, it only creates
references to the existing objects

[1] 
https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range
[2] https://docs.python.org/3/faq/programming.html#faq-multidimensional-list

--
nosy: +g...@garethrees.org
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue32194>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue20941] pytime.c:184 and pytime.c:218: runtime error, outside the range of representable values of type 'long'

2014-03-16 Thread Gareth Rees


Gareth Rees added the comment:

> How did you get this warning?

This looks like runtime output from a program built using Clang/LLVM with 
-fsanitize=undefined. See here: 
http://clang.llvm.org/docs/UsersManual.html#controlling-code-generation

Signed integer overflow is undefined behaviour, so by the time *sec = 
(time_t)intpart has been evaluated, the undefined behaviour has already 
happened. It is too late to check for it afterwards.

--
nosy: +Gareth.Rees

___
Python tracker 
<http://bugs.python.org/issue20941>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue19362] Documentation for len() fails to mention that it works on sets

2013-10-23 Thread Gareth Rees


New submission from Gareth Rees:

The help text for the len() built-in function says:

Return the number of items of a sequence or mapping.

This omits to mention that len() works on sets too. I suggest this be changed 
to:

Return the number of items of a sequence, mapping, or set.

Similarly, the documentation for len() says:

The argument may be a sequence (string, tuple or list) or a mapping 
(dictionary).

I suggest this be changed to

The argument may be a sequence (string, tuple or list), a mapping 
(dictionary), or a set.

(Of course, strictly speaking, len() accepts any object with a __len__ method, 
but sequences, mappings and sets are the ones that are built-in to the Python 
core, and so these are the ones it is important to mention in the help and the 
documentation.)

--
assignee: docs@python
components: Documentation
files: len-set.patch
keywords: patch
messages: 201019
nosy: Gareth.Rees, docs@python
priority: normal
severity: normal
status: open
title: Documentation for len() fails to mention that it works on sets
type: enhancement
Added file: http://bugs.python.org/file32313/len-set.patch

___
Python tracker 
<http://bugs.python.org/issue19362>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue19363] Python 2.7's future_builtins.map is not compatible with Python 3's map

2013-10-23 Thread Gareth Rees


New submission from Gareth Rees:

In Python 2.7, future_builtins.map accepts None as its first (function) 
argument:

Python 2.7.5 (default, Aug  1 2013, 01:01:17) 
>>> from future_builtins import map
>>> list(map(None, range(3), 'ABC'))
[(0, 'A'), (1, 'B'), (2, 'C')]

But in Python 3.x, map does not accept None as its first argument:

Python 3.3.2 (default, May 21 2013, 11:50:47) 
>>> list(map(None, range(3), 'ABC'))
Traceback (most recent call last):
  File "", line 1, in 
TypeError: 'NoneType' object is not callable

The documentation says, "if you want to write code compatible with Python 3 
builtins, import them from this module," so this incompatibility may give 
Python 2.7 programmers the false impression that a program which uses map(None, 
...) is portable to Python 3.

I suggest that future_builtins.map in Python 2.7 should behave the same as map 
in Python 3: that is, it should raise a TypeError if None was passed as the 
first argument.

--
components: Library (Lib)
messages: 201020
nosy: Gareth.Rees
priority: normal
severity: normal
status: open
title: Python 2.7's future_builtins.map is not compatible with Python 3's map
type: behavior
versions: Python 2.7

___
Python tracker 
<http://bugs.python.org/issue19363>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue19362] Documentation for len() fails to mention that it works on sets

2013-10-23 Thread Gareth Rees


Gareth Rees added the comment:

I considered suggesting "container", but the problem is that "container" is 
used elsewhere to mean "object supporting the 'in' operator" (in particular, 
collections.abc.Container has a __contains__ method but no __len__ method).

The abstract base class for "object with a length" is collections.abc.Sized, 
but I don't think using the term "sized" would be clear to users.

--

___
Python tracker 
<http://bugs.python.org/issue19362>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue20507] TypeError from str.join has no message

2014-02-04 Thread Gareth Rees


New submission from Gareth Rees:

If you pass an object of the wrong type to str.join, Python raises a
TypeError with no error message:

Python 3.4.0b3 (default, Jan 27 2014, 02:26:41) 
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> ''.join(1)
Traceback (most recent call last):
  File "", line 1, in 
TypeError

It is unnecessarily hard to understand from this error what the
problem actually was. Which object had the wrong type? What type
should it have been? Normally a TypeError is associated with a message
explaining which type was wrong, and what it should have been. For
example:

>>> b''.join(1)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: can only join an iterable

It would be nice if the TypeError from ''.join(1) included a message
like this.

The reason for the lack of message is that PyUnicode_Join starts out
by calling PySequence_Fast(seq, "") which suppresses the error message
from PyObject_GetIter. This commit by Tim Peters is responsible:
<http://hg.python.org/cpython/rev/8579859f198c>. The commit message
doesn't mention the suppression of the message so I can assume that it
was an oversight.

I suggest replacing the line:

fseq = PySequence_Fast(seq, "");

in PyUnicode_Join in unicodeobject.c with:

fseq = PySequence_Fast(seq, "can only join an iterable");

for consistency with bytes_join in stringlib/join.h. Patch attached.

--
components: Interpreter Core
files: join.patch
keywords: patch
messages: 210200
nosy: Gareth.Rees
priority: normal
severity: normal
status: open
title: TypeError from str.join has no message
type: behavior
versions: Python 3.4
Added file: http://bugs.python.org/file33900/join.patch

___
Python tracker 
<http://bugs.python.org/issue20507>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue20508] IndexError from ipaddress._BaseNetwork.getitem has no message

2014-02-04 Thread Gareth Rees


New submission from Gareth Rees:

If you try to look up an out-of-range address from an object returned
by ipaddress.ip_network, then ipaddress._BaseNetwork.__getitem__
raises an IndexError with no message:

Python 3.4.0b3 (default, Jan 27 2014, 02:26:41) 
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import ipaddress
>>> ipaddress.ip_network('2001:db8::8/125')[100]
Traceback (most recent call last):
  File "", line 1, in 
  File 
"/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/ipaddress.py",
 line 601, in __getitem__
raise IndexError
IndexError

Normally an IndexError is associated with a message explaining the
cause of the error. For example:

>>> [].pop()
Traceback (most recent call last):
  File "", line 1, in 
IndexError: pop from empty list

It would be nice if the IndexError from
ipaddress._BaseNetwork.__getitem__ included a message like this.

With the attached patch, the error message looks like this in the
positive case:

>>> ipaddress.ip_network('2001:db8::8/125')[100]
Traceback (most recent call last):
  File "", line 1, in 
  File "/Users/gdr/hg.python.org/cpython/Lib/ipaddress.py", line 602, in 
__getitem__
% (self, self.num_addresses))
IndexError: 100 out of range 0..7 for 2001:db8::8/125

and like this in the negative case:

>>> ipaddress.ip_network('2001:db8::8/125')[-100]
Traceback (most recent call last):
  File "", line 1, in 
  File "/Users/gdr/hg.python.org/cpython/Lib/ipaddress.py", line 608, in 
__getitem__
% (n - 1, self.num_addresses, self))
IndexError: -100 out of range -8..-1 for 2001:db8::8/125

(If you have a better suggestion for how the error message should
read, I could submit a revised patch. I suppose it could just say
"address index out of range" for consistency with list.__getitem__ and
str.__getitem__. But I think the extra information is likely to be
helpful for the programmer who is trying to track down the cause of an
error.)

--
components: Library (Lib)
files: ipaddress.patch
keywords: patch
messages: 210224
nosy: Gareth.Rees
priority: normal
severity: normal
status: open
title: IndexError from ipaddress._BaseNetwork.__getitem__ has no message
type: behavior
versions: Python 3.4
Added file: http://bugs.python.org/file33903/ipaddress.patch

___
Python tracker 
<http://bugs.python.org/issue20508>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue20508] IndexError from ipaddress._BaseNetwork.getitem has no message

2014-02-04 Thread Gareth Rees


Changes by Gareth Rees :


--
type: behavior -> enhancement

___
Python tracker 
<http://bugs.python.org/issue20508>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue19362] Documentation for len() fails to mention that it works on sets

2014-02-04 Thread Gareth Rees


Gareth Rees added the comment:

Here's a revised patch using Ezio's suggestion ("Return the number of items of 
a sequence or container").

--
Added file: http://bugs.python.org/file33904/len-set.patch

___
Python tracker 
<http://bugs.python.org/issue19362>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue19362] Documentation for len() fails to mention that it works on sets

2014-02-04 Thread Gareth Rees


Changes by Gareth Rees :


--
title: Documentation for len() fails to mention that it works   on sets -> 
Documentation for len() fails to mention that it works on sets
versions: +Python 3.4

___
Python tracker 
<http://bugs.python.org/issue19362>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue14376] sys.exit documents argument as "integer" but actually requires "subtype of int"

2014-02-04 Thread Gareth Rees


Gareth Rees added the comment:

Patch attached. I added a test case to Lib/test/test_sys.py.

--
Added file: http://bugs.python.org/file33906/exit.patch

___
Python tracker 
<http://bugs.python.org/issue14376>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue20510] Test cases in test_sys don't match the comments

2014-02-04 Thread Gareth Rees


New submission from Gareth Rees:

Lib/test/test_sys.py contains test cases with incorrect comments -- or
comments with incorrect test cases, if you prefer:

# call without argument
try:
sys.exit(0)
except SystemExit as exc:
self.assertEqual(exc.code, 0)
...

# call with tuple argument with one entry
# entry will be unpacked
try:
sys.exit(42)
except SystemExit as exc:
self.assertEqual(exc.code, 42)
...

# call with integer argument
try:
sys.exit((42,))
except SystemExit as exc:
self.assertEqual(exc.code, 42)
...

(In the quote above I've edited out some inessential detail; see the
file if you really want to know.)

You can see that in the first test case sys.exit is called with an
argument (although the comment claims otherwise); in the second it is
called with an integer (not a tuple), and in the third it is called
with a tuple (not an integer).

These comments have been unchanged since the original commit by Walter
Dörwald <http://hg.python.org/cpython/rev/6a1394660270>. I've attached
a patch that corrects the first test case and swaps the comments for
the second and third test cases:

# call without argument
rc = subprocess.call([sys.executable, "-c",
  "import sys; sys.exit()"])
self.assertEqual(rc, 0)

# call with integer argument
try:
sys.exit(42)
except SystemExit as exc:
self.assertEqual(exc.code, 42)
...

# call with tuple argument with one entry
# entry will be unpacked
try:
sys.exit((42,))
except SystemExit as exc:
self.assertEqual(exc.code, 42)
...

Note that in the first test case (without an argument) sys.exit() with
no argument actually raises SystemExit(None), so it's not sufficient
to catch the SystemExit and check exc.code; I need to check that it
actually gets translated to 0 on exit.

--
components: Tests
files: exittest.patch
keywords: patch
messages: 210246
nosy: Gareth.Rees
priority: normal
severity: normal
status: open
title: Test cases in test_sys don't match the comments
type: enhancement
versions: Python 3.4
Added file: http://bugs.python.org/file33908/exittest.patch

___
Python tracker 
<http://bugs.python.org/issue20510>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue19363] Python 2.7's future_builtins.map is not compatible with Python 3's map

2014-02-04 Thread Gareth Rees


Gareth Rees added the comment:

What about a documentation change instead? The future_builtins chapter
<http://docs.python.org/2/library/future_builtins.html> in the
standard library documentation could note the incompatibility.

I've attached a patch which adds the following note to the
documentation for future_builtins.map:

Note: In Python 3, map() does not accept None for the function
argument. (zip() can be used instead.)

--
status: closed -> open

___
Python tracker 
<http://bugs.python.org/issue19363>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue20510] Test cases in test_sys don't match the comments

2014-02-04 Thread Gareth Rees


Gareth Rees added the comment:

I normally try not to make changes "while we're in here" for fear of
introducing errors! But I guess the test cases are less critical, so
I've taken your review comments as a license to submit a revised patch
that:

* incorporates your suggestion to use assert_python_ok from
  test.script_helper, instead of subprocess.call;
* replaces the other uses of subprocess.call with
  assert_python_failure and adds a check on stdout;
* cleans up the assertion-testing code using the context manager form
  of unittest.TestCase.assertRaises.

I've signed and submitted a contributor agreement as requested.

--
Added file: http://bugs.python.org/file33914/exittest-1.patch

___
Python tracker 
<http://bugs.python.org/issue20510>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue19362] Documentation for len() fails to mention that it works on sets

2014-02-04 Thread Gareth Rees


Gareth Rees added the comment:

Here's a revised patch for Terry ("Return the number of items of a sequence or 
collection.")

--
Added file: http://bugs.python.org/file33916/len-set.patch

___
Python tracker 
<http://bugs.python.org/issue19362>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12691] tokenize.untokenize is broken

2014-02-04 Thread Gareth Rees


Gareth Rees added the comment:

Yury, let me see if I can move this issue forward. I clearly haven't
done a good job of explaining these problems, how they are related,
and why it makes sense to solve them together, so let me have a go
now.

1. tokenize.untokenize() raises AssertionError if you pass it a
   sequence of tokens output from tokenize.tokenize(). This was my
   original problem report, and it's still not fixed in Python 3.4:

  Python 3.4.0b3 (default, Jan 27 2014, 02:26:41) 
  [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin
  Type "help", "copyright", "credits" or "license" for more information.
  >>> import tokenize, io
  >>> t = list(tokenize.tokenize(io.BytesIO('1+1'.encode('utf8')).readline))
  >>> tokenize.untokenize(t)
  Traceback (most recent call last):
File "", line 1, in 
File 
"/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/tokenize.py",
 line 317, in untokenize
  out = ut.untokenize(iterable)
File 
"/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/tokenize.py",
 line 246, in untokenize
  self.add_whitespace(start)
File 
"/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/tokenize.py",
 line 232, in add_whitespace
  assert row <= self.prev_row
  AssertionError

   This defeats any attempt to use the sequence:

  input code -> tokenize -> transform -> untokenize -> output code

   to transform Python code. But this ought to be the main use case
   for the untokenize function! That's how I came across the problem
   in the first place, when I was starting to write Minipy
   <https://github.com/gareth-rees/minipy>.

2. Fixing problem #1 is easy (just swap <= for >=), but it raises the
   question: why wasn't this mistake caught by test_tokenize? There's
   a test function roundtrip() whose docstring says:

  Test roundtrip for `untokenize`. `f` is an open file or a
  string. The source code in f is tokenized, converted back to
  source code via tokenize.untokenize(), and tokenized again from
  the latter. The test fails if the second tokenization doesn't
  match the first.

   If I don't fix the problem with roundtrip(), then how can I be
   sure I have fixed the problem? Clearly it's necessary to fix the
   test case and establish that it provokes the assertion.

   So why doesn't roundtrip() detect the error? Well, it turns out
   that tokenize.untokenize() has two modes of operation and
   roundtrip() only tests one of them.

   The documentation for tokenize.untokenize() is rather cryptic, and
   all it says is:

  Each element returned by the [input] iterable must be a token
  sequence with at least two elements, a token number and token
  value. If only two tokens are passed, the resulting output is
  poor.

   By reverse-engineering the implementation, it seems that it has two
   modes of operation.

   In the first mode (which I have called "compatibility" mode after
   the method Untokenizer.compat() that implements it) you pass it
   tokens in the form of 2-element tuples (type, text). These must
   have exactly 2 elements.

   In the second mode (which I have called "full" mode based on the
   description "full input" in the docstring) you pass it tokens in
   the form of tuples with 5 elements (type, text, start, end, line).
   These are compatible with the namedtuples returned from
   tokenize.tokenize().

   The "full" mode has the buggy assertion, but
   test_tokenize.roundtrip() only tests the "compatibility" mode.

   So I must (i) fix roundtrip() so that it tests both modes; (ii)
   improve the documentation for tokenize.untokenize() so that
   programmers have some chance of figuring this out in future!

3. As soon as I make roundtrip() test both modes it provokes the
   assertion failure. Good, so I can fix the assertion. Problem #1
   solved.

   But now there are test failures in "full" mode:

  $ ./python.exe -m test test_tokenize
  [1/1] test_tokenize
  **
  File "/Users/gdr/hg.python.org/cpython/Lib/test/test_tokenize.py", line 
?, in test.test_tokenize.__test__.doctests
  Failed example:
  for testfile in testfiles:
  if not roundtrip(open(testfile, 'rb')):
  print("Roundtrip failed for file %s" % testfile)
  break
  else: True
  Expected:
  True
  Got:
  Roundtrip failed for file 
/Users/gdr/hg.python.org/cpython/Lib/test/test_platform.py
  *

[issue12691] tokenize.untokenize is broken

2014-02-04 Thread Gareth Rees


Changes by Gareth Rees :


--
nosy: +benjamin.peterson

___
Python tracker 
<http://bugs.python.org/issue12691>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12691] tokenize.untokenize is broken

2014-02-05 Thread Gareth Rees


Gareth Rees added the comment:

This morning I noticed that I had forgotten to update the library
reference, and I also noticed two more problems to add to the list
above:

6. Although Lib/test/test_tokenize.py looks like it contains tests for
   backslash-newline handling, these tests are ineffective. Here they
   are:

  >>> roundtrip("x=1+n"
  ...   "1\\n"
  ...   "# This is a commentn"
  ...   "# This also\\n")
  True
  >>> roundtrip("# Comment nx = 0")
  True

  There are two problems here: (i) because of the double string
  escaping, these are not backslash-newline, they are backslash-n.
  (ii) the roundtrip() test is too weak to detect this problem:
  tokenize() outputs an ERRORTOKEN for the backslash and untokenize()
  restores it. So the round-trip property is satisfied.

7. Problem 6 shows the difficulty of using doctests for this kind of
   test. It would be easier to ensure the correctness of these tests
   if the docstring was read from a separate file, so that at least
   the tests only need one level of string escaping.

I fixed problem 6 by updating these tests to use dump_tokens() instead
of roundtrip(). I have not fixed problem 7 (like 4 and 5, I can leave
it for another issue). Revised patch attached.

--
Added file: http://bugs.python.org/file33924/Issue12691.patch

___
Python tracker 
<http://bugs.python.org/issue12691>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12691] tokenize.untokenize is broken

2014-02-05 Thread Gareth Rees


Changes by Gareth Rees :


--
assignee:  -> docs@python
components: +Documentation, Tests
nosy: +docs@python

___
Python tracker 
<http://bugs.python.org/issue12691>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12691] tokenize.untokenize is broken

2014-02-05 Thread Gareth Rees


Changes by Gareth Rees :


Removed file: http://bugs.python.org/file33919/Issue12691.patch

___
Python tracker 
<http://bugs.python.org/issue12691>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12691] tokenize.untokenize is broken

2014-02-06 Thread Gareth Rees


Gareth Rees added the comment:

I did some research on the cause of this issue. The assertion was
added in this change by Jeremy Hylton in August 2006:
<https://mail.python.org/pipermail/python-checkins/2006-August/055812.html>
(The corresponding Mercurial commit is here:
<http://hg.python.org/cpython/rev/cc992d75d5b3#l217.25>).

At that point I believe the assertion was reasonable. I think it would
have been triggered by backslash-continued lines, but otherwise it
worked.

But in this change <http://hg.python.org/cpython/rev/51e24512e305> in
March 2008 Trent Nelson applied this patch by Michael Foord
<http://bugs.python.org/file9741/tokenize_patch.diff> to implement PEP
263 and fix issue719888. The patch added ENCODING tokens to the output
of tokenize.tokenize(). The ENCODING token is always generated with
row number 0, while the first actual token is generated with row
number 1. So now every token stream from tokenize.tokenize() sets off
the assertion.

The lack of a test case for tokenize.untokenize() in "full" mode meant
that it was (and is) all too easy for someone to accidentally break it
like this.

--

___
Python tracker 
<http://bugs.python.org/issue12691>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue20539] math.factorial may throw OverflowError

2014-02-07 Thread Gareth Rees


Gareth Rees added the comment:

It's not a case of internal storage overflowing. The error is from
Modules/mathmodule.c:1426 and it's the input 10**19 that's too large
to convert to a C long. You get the same kind of error in other
places where PyLong_AsLong or PyLong_AsInt is called on a
user-supplied value, for example:

>>> import pickle
>>> pickle.dumps(10**19, 10**19)
Traceback (most recent call last):
  File "", line 1, in 
OverflowError: Python int too large to convert to C long

--
nosy: +Gareth.Rees

___
Python tracker 
<http://bugs.python.org/issue20539>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue20606] Operator Documentation Example doesn't work

2014-02-12 Thread Gareth Rees


Gareth Rees added the comment:

The failing example is:

d = {}
keys = range(256)
vals = map(chr, keys)
map(operator.setitem, [d]*len(keys), keys, vals)   

which works in Python 2 where map returns a list, but not in Python 3 where map 
returns an iterator.

Doc/library/operator.rst follows the example with this note:

.. XXX: find a better, readable, example

Additional problems with the example:

1. It's poorly motivated because a dictionary comprehension would be simpler 
and shorter:

d = {i: chr(i) for i in range(256)}

2. It's also unclear why you'd need this dictionary when you could just call 
the function chr (but I suppose some interface might require a dictionary 
rather than a function).

3. To force the map to be evaluated, you need to write list(map(...)) which 
allocates an unnecessary list object and then throws it away. To avoid the 
unnecessary allocation you could use the "consume" recipe from the itertools 
documentation and write collections.deque(map(...), maxlen=0) but this is 
surely too obscure to use as an example.

I had a look through the Python sources, and made an Ohloh Code search for 
"operator.setitem" and I didn't find any good examples of its use, so I think 
the best thing to do is just to delete the example.

<http://code.ohloh.net/search?s=%22operator.setitem%22&pp=0&fl=Python&mp=1&ml=1&me=1&md=1&ff=1&filterChecked=true>

--
nosy: +Gareth.Rees

___
Python tracker 
<http://bugs.python.org/issue20606>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue20606] Operator Documentation Example doesn't work

2014-02-12 Thread Gareth Rees


Changes by Gareth Rees :


--
keywords: +patch
Added file: http://bugs.python.org/file34059/operator.patch

___
Python tracker 
<http://bugs.python.org/issue20606>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue19363] Python 2.7's future_builtins.map is not compatible with Python 3's map

2014-02-17 Thread Gareth Rees


Gareth Rees added the comment:

Sorry about that; here it is. I had second thoughts about recommending zip() as 
an alternative (that would only work for cases where the None was constant; in 
other cases you might need lambda *args: args, but this seemed too 
complicated), so the note now says only:

Note: In Python 3, map() does not accept None for the
function argument.

--
keywords: +patch
Added file: http://bugs.python.org/file34117/issue19363.patch

___
Python tracker 
<http://bugs.python.org/issue19363>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12691] tokenize.untokenize is broken

2014-02-18 Thread Gareth Rees


Gareth Rees added the comment:

Thanks for your work on this, Terry. I apologise for the complexity of my 
original report, and will try not to do it again.

--

___
Python tracker 
<http://bugs.python.org/issue12691>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue20727] Improved roundrobin itertools recipe

2014-02-24 Thread Gareth Rees


Gareth Rees added the comment:

> benchmarks show it to be more than twice as fast

I'm sure they do, but other benchmarks show it to be more than twice as slow. 
Try something like:

iterables = [range(100)] + [()] * 100

--
nosy: +Gareth.Rees

___
Python tracker 
<http://bugs.python.org/issue20727>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue20727] Improved roundrobin itertools recipe

2014-02-25 Thread Gareth Rees


Gareth Rees added the comment:

If 100 doesn't work for you, try a larger number.

--

___
Python tracker 
<http://bugs.python.org/issue20727>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue20727] Improved roundrobin itertools recipe

2014-02-25 Thread Gareth Rees


Gareth Rees added the comment:

I suspect I messed up the timing I did yesterday, because today I find that 100 
isn't large enough, but here's what I found today (in Python 3.3):

>>> from timeit import timeit
>>> test = [tuple(range(300))] + [()] * 100
>>> timeit(lambda:list(roundrobin1(*test)), number=1) # old recipe
8.386148632998811
>>> timeit(lambda:list(roundrobin2(*test)), number=1) # new recipe
16.757110453007044

The new recipe is more than twice as slow as the old in this case, and its 
performance gets relatively worse as you increase the number 300.

I should add that I do recognise that the new recipe is better for nearly all 
cases (it's simpler as well as faster), but I want to point out an important 
feature of the old recipe, namely that it discards iterables as they are 
finished with, giving it worst-case O(n) performance (albeit slow) whereas the 
new recipe has worst case O(n^2). As we found out with hash tables, worst-case 
O(n^2) performance can be a problem when inputs are untrusted, so there are use 
cases where people might legitimately prefer an O(n) solution even if it's a 
bit slower in common cases.

--

___
Python tracker 
<http://bugs.python.org/issue20727>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue20727] Improved roundrobin itertools recipe

2014-02-25 Thread Gareth Rees


Gareth Rees added the comment:

But now that I look at the code more carefully, the old recipe also has O(n^2) 
behaviour, because cycle(islice(nexts, pending)) costs O(n) and is called O(n) 
times. To have worst-case O(n) behaviour, you'd need something like this:

from collections import deque

def roundrobin3(*iterables):
"roundrobin('ABC', 'D', 'EF') --> A D E B F C"
nexts = deque(iter(it).__next__ for it in iterables)
while nexts:
try:
while True:
yield nexts[0]()
nexts.rotate(-1)
except StopIteration:
nexts.popleft()

>>> from timeit import timeit
>>> test = [tuple(range(1000))] + [()] * 1000
>>> timeit(lambda:list(roundrobin1(*test)), number=100) # old recipe
5.184364624001319
>>> timeit(lambda:list(roundrobin2(*test)), number=100) # new recipe
5.139592286024708
>>> timeit(lambda:list(roundrobin3(*test)), number=100)
0.16217014100402594

--

___
Python tracker 
<http://bugs.python.org/issue20727>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue20774] collections.deque should ship with a stdlib json serializer

2014-02-26 Thread Gareth Rees


Gareth Rees added the comment:

The JSON implementation uses these tests to determine how to serialize a Python 
object:

isinstance(o, (list, tuple))
isinstance(o, dict)

So any subclasses of list and tuple are serialized as a list, and any subclass 
of dict is serialized as an object. For example:

>>> json.dumps(collections.defaultdict())
'{}'
>>> json.dumps(collections.OrderedDict())
'{}'
>>> json.dumps(collections.namedtuple('mytuple', ())())
'[]'

When deserialized, you'll get back a plain dictionary or list, so there's no 
round-trip property here.

The tests could perhaps be changed to:

isinstance(o, collections.abc.Sequence)
isinstance(o, collections.abc.Mapping)

I'm not a JSON expert, so I have no informed opinion on whether this is a good 
idea or not, but in any case, this change wouldn't help with deques, as a deque 
is not a Sequence. That's because deques don't have an index method (see 
issue10059 and issue12543).

--
nosy: +Gareth.Rees

___
Python tracker 
<http://bugs.python.org/issue20774>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue20905] Adapt heapq push/pop/replace to allow passing a comparator.

2014-03-13 Thread Gareth Rees


Gareth Rees added the comment:

It would be better to accept a key function instead of a comparison
function (cf. heapq.nlargest and heapq.nsmallest).

But note that this has been proposed before and rejected: see
issue1904 where Raymond Hettinger provides this rationale:

Use cases aside, there is another design issue in that the
key-function approach doesn't work well with the heap functions on
regular lists. Successive calls to heap functions will of
necessity call the key- function multiple times for any given
element. This contrasts with sort () where the whole purpose of
the key function was to encapsulate the decorate-sort-undecorate
pattern which was desirable because the key- function called
exactly once per element.

However, in the case of the bisect module (where requests for a key
function are also common), Guido was recently persuaded that there was
a valid use case. See issue4356, and this thread on the Python-ideas
mailing list:
<https://mail.python.org/pipermail/python-ideas/2012-February/thread.html#13650>
where Arnaud Delobelle points out that:

Also, in Python 3 one can't assume that values will be comparable so
the (key, value) tuple trick won't work: comparing the tuples may well
throw a TypeError.

and Guido responds:

Bingo. That clinches it. We need to add key=.

--
nosy: +Gareth.Rees

___
Python tracker 
<http://bugs.python.org/issue20905>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue14376] sys.exit documents argument as "integer" but actually requires "subtype of int"

2017-02-01 Thread Gareth Rees


Gareth Rees added the comment:

Is there any chance of making progress on this issue? Is there anything wrong 
with my patch? Did I omit any relevant point in my message of 2016-06-11 16:26? 
It would be nice if this were not left in limbo for another four years.

--

___
Python tracker 
<http://bugs.python.org/issue14376>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue14376] sys.exit documents argument as "integer" but actually requires "subtype of int"

2017-02-01 Thread Gareth Rees


Gareth Rees added the comment:

In Windows, under cmd.exe, you can use %errorlevel%

--

___
Python tracker 
<http://bugs.python.org/issue14376>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue24869] shlex lineno inaccurate with certain inputs

2017-02-01 Thread Gareth Rees


Gareth Rees added the comment:

Here's a patch that implements my proposal (1) -- under this patch, tokens read 
from an input stream belong to a subtype of str with startline and endline 
attributes giving the line numbers of the first and last character of the 
token. This allows the accurate reporting of error messages relating to a 
token. I updated the documentation and added a test case.

--
keywords: +patch
Added file: http://bugs.python.org/file46479/issue24869.patch

___
Python tracker 
<http://bugs.python.org/issue24869>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue14376] sys.exit documents argument as "integer" but actually requires "subtype of int"

2017-02-01 Thread Gareth Rees


Gareth Rees added the comment:

Thanks for the revised patch, Mark. The new tests look good.

--

___
Python tracker 
<http://bugs.python.org/issue14376>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue14376] sys.exit documents argument as "integer" but actually requires "subtype of int"

2017-02-02 Thread Gareth Rees


Gareth Rees added the comment:

Thank you, Mark (and everyone else who helped).

--

___
Python tracker 
<http://bugs.python.org/issue14376>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue20508] IndexError from ipaddress._BaseNetwork.getitem has no message

2016-06-11 Thread Gareth Rees


Gareth Rees added the comment:

I've attached a revised patch that addresses Berker Peksag's concerns:

1. The message associated with the IndexError is now "address out of range" 
with no information about which address failed or why.

2. There's a new test case for an IndexError from an IPv6 address lookup.

--
Added file: http://bugs.python.org/file43341/ipaddress.patch

___
Python tracker 
<http://bugs.python.org/issue20508>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue14376] sys.exit documents argument as "integer" but actually requires "subtype of int"

2016-06-11 Thread Gareth Rees


Gareth Rees added the comment:

Let's not allow the perfect to be the enemy of the good here.

The issue I reported is a very specific one: in Python 2.7, if you pass a long 
to sys.exit, then the value of the long is not used as the exit code. This is 
bad because functions like os.spawnv that return exit codes (that you might 
reasonably want to pass on to sys.exit) can return them as long.

My patch only proposes to address this one issue. In order to keep the impact 
as small as possible, I do not propose to make any other changes, or address 
any other problems.

But in the comments here people have brought up THREE other issues:

1. Alexander Belopolsky expresses the concern that "(int)PyLong_AsLong(value) 
can silently convert non-zero error code to zero."

This is not a problem introduced by my patch -- the current code is:

exitcode = (int)PyInt_AsLong(value)

which has exactly the same problem (because PyIntObject stores its value as a 
long). So this concern (even if valid) is not a reason to reject my patch.

2. Ethan Furman wrote: "we need to protect against overflow from  to 
"

But again, this is not a problem introduced by my patch. The current code says:

exitcode = (int)PyInt_AsLong(value);

and my patch does not change this line. The possibility of this overflow is not 
a reason to reject my patch.

3. Alexander says, "Passing anything other than one of the os.EX_* constants to 
sys.exit() is a bad idea"

First, this is not a problem introduced by my patch. The existing code in 
Python 2.7 allows you to specify other exit codes. So this problem (if it is a 
problem) is not a reason to reject my patch.

Second, this claim is surely not right -- when a subprocess fails it often 
makes sense to pass on the exit code of the subprocess, whatever that is. This 
is exactly the use case that I mentioned in my original report (that is, 
passing on the exit code from os.spawnv to sys.exit).

--

___
Python tracker 
<http://bugs.python.org/issue14376>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue20508] IndexError from ipaddress._BaseNetwork.getitem has no message

2016-06-11 Thread Gareth Rees


Gareth Rees added the comment:

Thank you for applying this patch.

--

___
Python tracker 
<http://bugs.python.org/issue20508>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue27306] Grammatical Error in Documentation - Tarfile page

2016-06-13 Thread Gareth Rees


Gareth Rees added the comment:

Here's a patch improving the grammar in the tarfile documentation.

--
keywords: +patch
nosy: +Gareth.Rees
Added file: http://bugs.python.org/file43375/issue27306.patch

___
Python tracker 
<http://bugs.python.org/issue27306>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue24869] shlex lineno inaccurate with certain inputs

2016-06-13 Thread Gareth Rees


Gareth Rees added the comment:

Just to restate the problem:

The use case is that when emitting an error message for a token, we want to 
include the number of the line containing the token (or the number of the line 
where the token started, if the token spans multiple lines, as it might if it's 
a string containing newlines).

But there is no way to satisfy this use case given the features of the shlex 
module. In particular, shlex.lineno (which looks as if it ought to help) is 
actually the line number of the first character that has not yet been consumed 
by the lexer, and in general this is not the same as the line number of the 
previous (or the next) token.

I can think of two alternatives that would satisfy the use case:

1. Instead of returning tokens as str objects, return them as instances of a 
subclass of str that has a property that gives the line number of the first 
character of the token. (Maybe it should also have properties for the column 
number of the first character, and the line and column number of the last 
character too? These properties would support better error messages.)

2. Add new methods that return tuples giving the token and its line number (and 
possibly column number etc. as in alternative 1).

My preference would be for alternative (1), but I suppose there is a very tiny 
risk of breaking some code that relied upon get_token returning an instance of 
str exactly rather than an instance of a subclass of str.

--
nosy: +Gareth.Rees

___
Python tracker 
<http://bugs.python.org/issue24869>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue24869] shlex lineno inaccurate with certain inputs

2016-06-13 Thread Gareth Rees


Gareth Rees added the comment:

A third alternative:

3. Add a method whose effect is to consume comments and whitespace, but which 
does not yield a token. You could then call this method, and then look at 
shlex.lineno, which will be the line number of the first character of the next 
token (if there is a next token).

--

___
Python tracker 
<http://bugs.python.org/issue24869>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue27588] Type objects are hashable and comparable for equality but this is not documented

2016-07-22 Thread Gareth Rees


New submission from Gareth Rees:

The type objects constructed by the metaclasses in the typing module are 
hashable and comparable for equality:

>>> from typing import *
>>> {Mapping[str, int], Mapping[int, str]}
{typing.Mapping[int, str], typing.Mapping[str, int]}
>>> Union[str, int, float] == Union[float, int, str]
True
>>> List[int] == List[float]
False

but this is not clearly documented in the documentation for the typing module 
(there are a handful of examples using equality, but it's not explicit that 
these are runnable).

It would be nice if there were explicit documentation for these properties of 
type objects.

--
assignee: docs@python
components: Documentation
messages: 270981
nosy: Gareth.Rees, docs@python
priority: normal
severity: normal
status: open
title: Type objects are hashable and comparable for equality but this is not 
documented
type: enhancement
versions: Python 3.5, Python 3.6

___
Python tracker 
<http://bugs.python.org/issue27588>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

1 2 >

1 - 100 of 115 matches

Mail list logo