[issue15443] datetime module has no support for nanoseconds
Gareth Rees added the comment: I also have a use case that would benefit from nanosecond resolution in Python's datetime objects, that is, representing and querying the results of clock_gettime() in a program trace. On modern Linuxes with a vDSO, clock_gettime() does not require a system call and completes within a few nanoseconds. So Python's datetime objects do not have sufficient resolution to distinguish between adjacent calls to clock_gettime(). This means that, like Mark Dickinson above, I have to choose between using datetime for queries (which would be convenient) and accepting that nearby events in the trace may be indistinguishable, or implementing my own datetime-like data structure. -- nosy: +g...@garethrees.org ___ Python tracker <https://bugs.python.org/issue15443> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46065] re.findall takes forever and never ends
Gareth Rees added the comment: The way to avoid this behaviour is to disallow the attempts at matching that you know are going to fail. As Serhiy described above, if the search fails starting at the first character of the string, it will move forward and try again starting at the second character. But you know that this new attempt must fail, so you can force the regular expression engine to discard the attempt immediately. Here's an illustration in a simpler setting, where we are looking for all strings of 'a' followed by 'b': >>> import re >>> from timeit import timeit >>> text = 'a' * 10 >>> timeit(lambda:re.findall(r'a+b', text), number=1) 6.64353118114 We know that any successful match must be preceded by a character other than 'a' (or the beginning of the string), so we can reject many unsuccessful matches like this: >>> timeit(lambda:re.findall(r'(?:^|[^a])(a+b)', text), number=1) 0.00374348114981 In your case, a successful match must be preceded by [^a-zA-Z0-9_.+-] (or the beginning of the string). -- nosy: +g...@garethrees.org ___ Python tracker <https://bugs.python.org/issue46065> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46065] re.findall takes forever and never ends
Gareth Rees added the comment: This kind of question is frequently asked (#3128, #29977, #28690, #30973, #1737127, etc.), and so maybe it deserves an answer somewhere in the Python documentation. -- resolution: -> wont fix stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue46065> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12514] timeit disables garbage collection if timed code raises an exception
New submission from Gareth Rees : If you call timeit.timeit and the timed code raises an exception, then garbage collection is disabled. I have verified this in Python 2.7 and 3.2. Here's an interaction with Python 3.2: Python 3.2 (r32:88445, Jul 7 2011, 15:52:49) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import timeit, gc >>> gc.isenabled() True >>> timeit.timeit('raise Exception') Traceback (most recent call last): File "", line 1, in File "/opt/local/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/timeit.py", line 228, in timeit return Timer(stmt, setup, timer).timeit(number) File "/opt/local/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/timeit.py", line 194, in timeit timing = self.inner(it, self.timer) File "", line 6, in inner Exception >>> gc.isenabled() False The problem is with the following code in Lib/timeit.py (lines 192–196): gcold = gc.isenabled() gc.disable() timing = self.inner(it, self.timer) if gcold: gc.enable() This should be changed to something like this: gcold = gc.isenabled() gc.disable() try: timing = self.inner(it, self.timer) finally: if gcold: gc.enable() -- components: Library (Lib) messages: 139978 nosy: Gareth.Rees priority: normal severity: normal status: open title: timeit disables garbage collection if timed code raises an exception type: behavior versions: Python 2.7, Python 3.2 ___ Python tracker <http://bugs.python.org/issue12514> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12514] timeit disables garbage collection if timed code raises an exception
Gareth Rees added the comment: Patch attached. -- keywords: +patch Added file: http://bugs.python.org/file22605/issue12514.patch ___ Python tracker <http://bugs.python.org/issue12514> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12675] tokenize module happily tokenizes code with syntax errors
New submission from Gareth Rees : The tokenize module is happy to tokenize Python source code that the real tokenizer would reject. Pretty much any instance where tokenizer.c returns ERRORTOKEN will illustrate this feature. Here are some examples: Python 3.3.0a0 (default:2d69900c0820, Aug 1 2011, 13:46:51) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from tokenize import generate_tokens >>> from io import StringIO >>> def tokens(s): ..."""Return a string showing the tokens in the string s.""" ...return '|'.join(t[1] for t in generate_tokens(StringIO(s).readline)) ... >>> # Bad exponent >>> print(tokens('1if 2else 3')) 1|if|2|else|3| >>> 1if 2else 3 File "", line 1 1if 2else 3 ^ SyntaxError: invalid token >>> # Bad hexadecimal constant. >>> print(tokens('0xfg')) 0xf|g| >>> 0xfg File "", line 1 0xfg ^ SyntaxError: invalid syntax >>> # Missing newline after continuation character. >>> print(tokens('\\pass')) \|pass| >>> \pass File "", line 1 \pass ^ SyntaxError: unexpected character after line continuation character It is surprising that the tokenize module does not yield the same tokens as Python itself, but as this limitation only affects incorrect Python code, perhaps it just needs a mention in the tokenize documentation. Something along the lines of, "The tokenize module generates the same tokens as Python's own tokenizer if it is given correct Python code. However, it may incorrectly tokenize Python code containing syntax errors that the real tokenizer would reject." -- components: Library (Lib) messages: 141503 nosy: Gareth.Rees priority: normal severity: normal status: open title: tokenize module happily tokenizes code with syntax errors type: behavior versions: Python 3.3 ___ Python tracker <http://bugs.python.org/issue12675> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12675] tokenize module happily tokenizes code with syntax errors
Gareth Rees added the comment: These errors are generated directly by the tokenizer. In tokenizer.c, the tokenizer generates ERRORTOKEN when it encounters something it can't tokenize. This causes parsetok() in parsetok.c to stop tokenizing and return an error. -- ___ Python tracker <http://bugs.python.org/issue12675> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12675] tokenize module happily tokenizes code with syntax errors
Gareth Rees added the comment: I'm having a look to see if I can make tokenize.py better match the real tokenizer, but I need some feedback on a couple of design decisions. First, how to handle tokenization errors? There are three possibilities: 1. Generate an ERRORTOKEN, resynchronize, and continue to tokenize from after the error. This is what tokenize.py currently does in the two cases where it detects an error. 2. Generate an ERRORTOKEN and stop tokenizing. This is what tokenizer.c does. 3. Raise an exception (IndentationError, SyntaxError, or TabError). This is what the user sees when the parser is invoked from pythonrun.c. Since the documentation for tokenize.py says, "It is designed to match the working of the Python tokenizer exactly", I think that implementing option (2) is best here. (This will mean changing the behaviour of tokenize.py in the two cases where it currently detects an error, so that it stops tokenizing.) Second, how to record the cause of the error? The real tokenizer records the cause of the error in the 'done' field of the 'tok_state" structure, but tokenize.py loses this information. I propose to add fields to the TokenInfo structure (which is a namedtuple) to record this information. The real tokenizer uses numeric constants from errcode.h (E_TOODEEP, E_TABSPACE, E_DEDENT etc), and pythonrun.c converts these to English-language error messages (E_TOODEEP: "too many levels of indentation"). Both of these pieces of information will be useful, so I propose to add two fields "error" (containing a string like "TOODEEP") and "errormessage" (containing the English-language error message). -- ___ Python tracker <http://bugs.python.org/issue12675> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12675] tokenize module happily tokenizes code with syntax errors
Gareth Rees added the comment: Having looked at some of the consumers of the tokenize module, I don't think my proposed solutions will work. It seems to be the case that the resynchronization behaviour of tokenize.py is important for consumers that are using it to transform arbitrary Python source code (like 2to3.py). These consumers are relying on the "roundtrip" property that X == untokenize(tokenize(X)). So solution (1) is necessary for the handling of tokenization errors. Also, that fact that TokenInfo is a 5-tuple is relied on in some places (e.g. lib2to3/patcomp.py line 38), so it can't be extended. And there are consumers (though none in the standard library) that are relying on type=ERRORTOKEN being the way to detect errors in a tokenization stream. So I can't overload that field of the structure. Any good ideas for how to record the cause of error without breaking backwards compatibility? -- ___ Python tracker <http://bugs.python.org/issue12675> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12675] tokenize module happily tokenizes code with syntax errors
Gareth Rees added the comment: Ah ... TokenInfo is a *subclass* of namedtuple, so I can add extra properties to it without breaking consumers that expect it to be a 5-tuple. -- ___ Python tracker <http://bugs.python.org/issue12675> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12691] tokenize.untokenize is broken
New submission from Gareth Rees : tokenize.untokenize is completely broken. Python 3.2.1 (default, Jul 19 2011, 00:09:43) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import tokenize, io >>> t = list(tokenize.tokenize(io.BytesIO('1+1'.encode('utf8')).readline)) >>> tokenize.untokenize(t) Traceback (most recent call last): File "", line 1, in File "/opt/local/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/tokenize.py", line 250, in untokenize out = ut.untokenize(iterable) File "/opt/local/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/tokenize.py", line 179, in untokenize self.add_whitespace(start) File "/opt/local/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/tokenize.py", line 165, in add_whitespace assert row <= self.prev_row AssertionError The assertion is simply bogus: the <= should be >=. The reason why no-one has spotted this is that the unit tests for the tokenize module only ever call untokenize() in "compatibility" mode, passing in a 2-tuple instead of a 5-tuple. I propose to fix this, and add unit tests, at the same time as fixing other problems with tokenize.py (issue12675). -- components: Library (Lib) messages: 141634 nosy: Gareth.Rees priority: normal severity: normal status: open title: tokenize.untokenize is broken type: behavior versions: Python 3.2, Python 3.3 ___ Python tracker <http://bugs.python.org/issue12691> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12691] tokenize.untokenize is broken
Gareth Rees added the comment: See my last paragraph: I propose to deliver a single patch that fixes both this bug and issue12675. I hope this is OK. (If you prefer, I'll try to split the patch in two.) I just noticed another bug in untokenize(): in compatibility mode, if untokenize() is passed an iterator rather than a list, then the first token gets discarded: Python 3.3.0a0 (default:c099ba0a278e, Aug 2 2011, 12:35:03) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from tokenize import untokenize >>> from token import * >>> untokenize([(NAME, 'hello')]) 'hello ' >>> untokenize(iter([(NAME, 'hello')])) '' No-one's noticed this because the unit tests only ever pass lists to untokenize(). -- ___ Python tracker <http://bugs.python.org/issue12691> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12691] tokenize.untokenize is broken
Gareth Rees added the comment: I think I can make these changes independently and issue two patches, one fixing the problems with untokenize listed here, and another improving tokenize. I've just noticed a third bug in untokenize: in full mode, it doesn't handle backslash-continued lines correctly. Python 3.3.0a0 (default:c099ba0a278e, Aug 2 2011, 12:35:03) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from io import BytesIO >>> from tokenize import tokenize, untokenize >>> untokenize(tokenize(BytesIO('1 and \\\n not 2'.encode('utf8')).readline)) b'1 andnot 2' -- ___ Python tracker <http://bugs.python.org/issue12691> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12675] tokenize module happily tokenizes code with syntax errors
Gareth Rees added the comment: Terry: agreed. Does anyone actually use this module? Does anyone know what the design goals are for tokenize? If someone can tell me, I'll do my best to make it meet them. Meanwhile, here's another bug. Each character of trailing whitespace is tokenized as an ERRORTOKEN. Python 3.3.0a0 (default:c099ba0a278e, Aug 2 2011, 12:35:03) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from tokenize import tokenize,untokenize >>> from io import BytesIO >>> list(tokenize(BytesIO('1 '.encode('utf8')).readline)) [TokenInfo(type=57 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line=''), TokenInfo(type=2 (NUMBER), string='1', start=(1, 0), end=(1, 1), line='1 '), TokenInfo(type=54 (ERRORTOKEN), string=' ', start=(1, 1), end=(1, 2), line='1 '), TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')] -- ___ Python tracker <http://bugs.python.org/issue12675> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12691] tokenize.untokenize is broken
Gareth Rees added the comment: Please find attached a patch containing four bug fixes for untokenize(): * untokenize() now always returns a bytes object, defaulting to UTF-8 if no ENCODING token is found (previously it returned a string in this case). * In compatibility mode, untokenize() successfully processes all tokens from an iterator (previously it discarded the first token). * In full mode, untokenize() now returns successfully (previously it asserted). * In full mode, untokenize() successfully processes tokens that were separated by a backslashed newline in the original source (previously it ran these tokens together). In addition, I've added some unit tests: * Test case for backslashed newline. * Test case for missing ENCODING token. * roundtrip() tests both modes of untokenize() (previously it just tested compatibility mode). and updated the documentation: * Update the docstring for untokenize to better describe its actual behaviour, and remove the false claim "Untokenized source will match input source exactly". (We can restore this claim if we ever fix tokenize/untokenize so that it's true.) * Update the documentation for untokenize in tokenize.rdt to match the docstring. I welcome review: this is my first proper patch to Python. -- keywords: +patch Added file: http://bugs.python.org/file22842/Issue12691.patch ___ Python tracker <http://bugs.python.org/issue12691> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12700] test_faulthandler fails on Mac OS X Lion
New submission from Gareth Rees : On Mac OS 10.7, test_faulthandler fails. See test output below. It looks as though the tests may be at fault in expecting to see "(?:Segmentation fault|Bus error)" instead of "(?:Segmentation fault|Bus error|Illegal instruction)". test_disable (__main__.FaultHandlerTests) ... ok test_dump_traceback (__main__.FaultHandlerTests) ... ok test_dump_traceback_file (__main__.FaultHandlerTests) ... ok test_dump_traceback_threads (__main__.FaultHandlerTests) ... ok test_dump_traceback_threads_file (__main__.FaultHandlerTests) ... ok test_dump_tracebacks_later (__main__.FaultHandlerTests) ... ok test_dump_tracebacks_later_cancel (__main__.FaultHandlerTests) ... ok test_dump_tracebacks_later_file (__main__.FaultHandlerTests) ... ok test_dump_tracebacks_later_repeat (__main__.FaultHandlerTests) ... ok test_dump_tracebacks_later_twice (__main__.FaultHandlerTests) ... ok test_enable_file (__main__.FaultHandlerTests) ... FAIL test_enable_single_thread (__main__.FaultHandlerTests) ... FAIL test_fatal_error (__main__.FaultHandlerTests) ... ok test_gil_released (__main__.FaultHandlerTests) ... FAIL test_is_enabled (__main__.FaultHandlerTests) ... ok test_read_null (__main__.FaultHandlerTests) ... FAIL test_register (__main__.FaultHandlerTests) ... ok test_register_chain (__main__.FaultHandlerTests) ... ok test_register_file (__main__.FaultHandlerTests) ... ok test_register_threads (__main__.FaultHandlerTests) ... ok test_sigabrt (__main__.FaultHandlerTests) ... ok test_sigbus (__main__.FaultHandlerTests) ... ok test_sigfpe (__main__.FaultHandlerTests) ... ok test_sigill (__main__.FaultHandlerTests) ... ok test_sigsegv (__main__.FaultHandlerTests) ... ok test_stack_overflow (__main__.FaultHandlerTests) ... ok test_unregister (__main__.FaultHandlerTests) ... ok == FAIL: test_enable_file (__main__.FaultHandlerTests) -- Traceback (most recent call last): File "test_faulthandler.py", line 207, in test_enable_file filename=filename) File "test_faulthandler.py", line 105, in check_fatal_error self.assertRegex(output, regex) AssertionError: Regex didn't match: '^Fatal Python error: (?:Segmentation fault|Bus error)\n\nCurrent\\ thread\\ XXX:\n File "", line 4 in $' not found in 'Fatal Python error: Illegal instruction\n\nCurrent thread XXX:\n File "", line 4 in ' == FAIL: test_enable_single_thread (__main__.FaultHandlerTests) -- Traceback (most recent call last): File "test_faulthandler.py", line 217, in test_enable_single_thread all_threads=False) File "test_faulthandler.py", line 105, in check_fatal_error self.assertRegex(output, regex) AssertionError: Regex didn't match: '^Fatal Python error: (?:Segmentation fault|Bus error)\n\nTraceback\\ \\(most\\ recent\\ call\\ first\\):\n File "", line 3 in $' not found in 'Fatal Python error: Illegal instruction\n\nTraceback (most recent call first):\n File "", line 3 in ' == FAIL: test_gil_released (__main__.FaultHandlerTests) -- Traceback (most recent call last): File "test_faulthandler.py", line 195, in test_gil_released '(?:Segmentation fault|Bus error)') File "test_faulthandler.py", line 105, in check_fatal_error self.assertRegex(output, regex) AssertionError: Regex didn't match: '^Fatal Python error: (?:Segmentation fault|Bus error)\n\nCurrent\\ thread\\ XXX:\n File "", line 3 in $' not found in 'Fatal Python error: Illegal instruction\n\nCurrent thread XXX:\n File "", line 3 in ' == FAIL: test_read_null (__main__.FaultHandlerTests) -- Traceback (most recent call last): File "test_faulthandler.py", line 115, in test_read_null '(?:Segmentation fault|Bus error)') File "test_faulthandler.py", line 105, in check_fatal_error self.assertRegex(output, regex) AssertionError: Regex didn't match: '^Fatal Python error: (?:Segmentation fault|Bus error)\n\nCurrent\\ thread\\ XXX:\n File "", line 3 in $' not found in 'Fatal Pyth
[issue12691] tokenize.untokenize is broken
Gareth Rees added the comment: Thanks Ezio for the review. I've made all the changes you requested, (except for the re-ordering of paragraphs in the documentation, which I don't want to do because that would lead to the "round-trip property" being mentioned before it's defined). Revised patch attached. -- Added file: http://bugs.python.org/file22844/Issue12691.patch ___ Python tracker <http://bugs.python.org/issue12691> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12700] test_faulthandler fails on Mac OS X Lion
Gareth Rees added the comment: After changing NULL to (int *)1, all tests pass. -- ___ Python tracker <http://bugs.python.org/issue12700> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12700] test_faulthandler fails on Mac OS X Lion
Gareth Rees added the comment: All tests now pass. -- ___ Python tracker <http://bugs.python.org/issue12700> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45476] [C API] Convert "AS" functions, like PyFloat_AS_DOUBLE(), to static inline functions
Gareth Rees added the comment: If the problem is accidental use of the result of PyFloat_AS_DOUBLE() as an lvalue, why not use the comma operator to ensure that the result is an rvalue? The C99 standard says "A comma operator does not yield an lvalue" in §6.5.17; I imagine there is similar text in other versions of the standard. The idea would be to define a helper macro like this: /* As expr, but can only be used as an rvalue. */ #define Py_RVALUE(expr) ((void)0, (expr)) and then use the helper where needed, for example: #define PyFloat_AS_DOUBLE(op) Py_RVALUE(((PyFloatObject *)(op))->ob_fval) -- nosy: +g...@garethrees.org ___ Python tracker <https://bugs.python.org/issue45476> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45643] SIGSTKFLT is missing from the signals module on Linux
New submission from Gareth Rees : BACKGROUND On Linux, "man 7 signal" includes SIGSTKFLT in its table of "various other signals": Signal Value Action Comment ─── SIGSTKFLT -,16,- Term Stack fault on coprocessor (unused) Here "-,16,-" means that the signal is defined with the value 16 on x86 and ARM but not on Alpha, SPARC or MIPS. I believe that the intention was to use SIGSTKFLT for stack faults on the x87 math coprocessor, but this was either removed or never implemented, so that the signal is defined in /usr/include/signal.h but not used by the Linux kernel. USE CASE SIGSTKFLT is one of a handful of signals that are not used by the kernel, so that user-space programs are free to use it for their own purposes, for example for inter-thread or inter-process pre-emptive communication. Accordingly, it would be nice if the name SIGSTKFLT were available in the Python signal module on the platforms where the signal is available, for use and reporting in these cases. -- components: Library (Lib) messages: 405174 nosy: g...@garethrees.org priority: normal severity: normal status: open title: SIGSTKFLT is missing from the signals module on Linux type: enhancement versions: Python 3.11 ___ Python tracker <https://bugs.python.org/issue45643> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45643] SIGSTKFLT is missing from the signals module on Linux
Change by Gareth Rees : -- keywords: +patch pull_requests: +27529 stage: -> patch review pull_request: https://github.com/python/cpython/pull/29266 ___ Python tracker <https://bugs.python.org/issue45643> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45643] SIGSTKFLT is missing from the signals module on Linux
Gareth Rees added the comment: Tagging vstinner as you have touched Modules/signalmodule.c a few times in the last year. What do you think? -- nosy: +vstinner ___ Python tracker <https://bugs.python.org/issue45643> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17005] Add a topological sort algorithm
Gareth Rees added the comment: I'd like to push back on the idea that graphs with isolated vertices are "unusual cases" as suggested by Raymond. A very common use case (possibly the most common) for topological sorting is job scheduling. In this use case you have a collection of jobs, some of which have dependencies on other jobs, and you want to output a schedule according to which the jobs can be executed so that each job is executed after all its dependencies. In this use case, any job that has no dependencies, and is not itself a dependency of any other job, is an isolated vertex in the dependency graph. This means that the proposed interface (that is, the interface taking only pairs of vertices) will not be suitable for this use case. Any any programmer who tries to use it for this use case will be setting themselves up for failure. -- ___ Python tracker <https://bugs.python.org/issue17005> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40707] Popen.communicate documentation does not say how to get the return code
New submission from Gareth Rees : When using subprocess.Popen.communicate(), it is natural to wonder how to get the exit code of the subprocess. However, the documentation [1] says: Interact with process: Send data to stdin. Read data from stdout and stderr, until end-of-file is reached. Wait for process to terminate. The optional input argument should be data to be sent to the child process, or None, if no data should be sent to the child. If streams were opened in text mode, input must be a string. Otherwise, it must be bytes. communicate() returns a tuple (stdout_data, stderr_data). The data will be strings if streams were opened in text mode; otherwise, bytes. If you can guess that communicate() might set returncode, then you can find what you need in the documentation for that attribute [2]: The child return code, set by poll() and wait() (and indirectly by communicate()). I suggest that the documentation for communicate() be updated to mention that it sets the returncode attribute. This would be consistent with poll() and wait(), which already mention this. [1]: https://docs.python.org/3/library/subprocess.html#subprocess.Popen.communicate [2]: https://docs.python.org/3/library/subprocess.html#subprocess.Popen.returncode -- assignee: docs@python components: Documentation messages: 369502 nosy: docs@python, g...@garethrees.org priority: normal severity: normal status: open title: Popen.communicate documentation does not say how to get the return code type: enhancement versions: Python 3.10, Python 3.5, Python 3.6, Python 3.7, Python 3.8, Python 3.9 ___ Python tracker <https://bugs.python.org/issue40707> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40707] Popen.communicate documentation does not say how to get the return code
Change by Gareth Rees : -- keywords: +patch pull_requests: +19559 stage: -> patch review pull_request: https://github.com/python/cpython/pull/20283 ___ Python tracker <https://bugs.python.org/issue40707> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40707] Popen.communicate documentation does not say how to get the return code
Gareth Rees added the comment: The following test cases in test_subprocess.py call the communicate() method and then immediately assert that returncode attribute has the expected value: * test_stdout_none * test_stderr_redirect_with_no_stdout_redirect * test_stdout_filedes_of_stdout * test_communicate_stdin * test_universal_newlines_communicate_stdin * test_universal_newlines_communicate_input_none * test_universal_newlines_communicate_stdin_stdout_stderr * test_nonexisting_with_pipes * test_wait_when_sigchild_ignored * test_startupinfo_copy * test_close_fds_with_stdio * test_communicate_stdin You'll see that some of these test for success (returncode == 0) and some for failure (returncode == 1). This seems like adequate test coverage to me, but if something is missing, let me know. -- ___ Python tracker <https://bugs.python.org/issue40707> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40707] Popen.communicate documentation does not say how to get the return code
Gareth Rees added the comment: Is there anything I can do to move this forward? -- ___ Python tracker <https://bugs.python.org/issue40707> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41092] Report actual size from 'os.path.getsize'
Gareth Rees added the comment: The proposed change adds a Boolean flag to os.path.getsize() so that it returns: os.stat(filename).st_blocks * 512 (where the 512 is the file system block size on Linux; some work is needed to make this portable to other operating systems). The Boolean argument here would always be constant in practice -- that is, you'd always call it like this: virtual_size = os.path.getsize(filename, apparent=True) allocated_size = os.path.getsize(filename, apparent=False) and never like this: x_size = os.path.getsize(filename, apparent=x) where x varies at runtime. The "no constant bool arguments" design principle [1] suggests that this should be added as a new function, something like os.path.getallocatedsize(). [1] https://mail.python.org/pipermail/python-ideas/2016-May/040181.html -- nosy: +g...@garethrees.org ___ Python tracker <https://bugs.python.org/issue41092> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14376] sys.exit documents argument as "integer" but actually requires "subtype of int"
New submission from Gareth Rees : The documentation for sys.exit says, "The optional argument arg can be an integer giving the exit status (defaulting to zero), or another type of object". However, the arguments that are treated as exit statuses are actually "subtypes of int". So, a bool argument is fine: $ python2.7 -c "import sys; sys.exit(False)"; echo $? 0 But a long argument is not: $ python2.7 -c "import sys; sys.exit(long(0))"; echo $? 0 1 The latter behaviour can be surprising since functions like os.spawnv may return the exit status of the executed process as a long on some platforms, so that if you try to pass on the exit code via code = os.spawnv(...) sys.exit(code) you may get a mysterious surprise: code is 0 but exit code is 1. It would be simple to change line 1112 of pythonrun.c from if (PyInt_Check(value)) to if (PyInt_Check(value) || PyLong_Check(value)) (This issue is not present in Python 3 because there is no longer a distinction between int and long.) -- components: Library (Lib) messages: 156470 nosy: Gareth.Rees priority: normal severity: normal status: open title: sys.exit documents argument as "integer" but actually requires "subtype of int" type: behavior versions: Python 2.6, Python 2.7 ___ Python tracker <http://bugs.python.org/issue14376> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14376] sys.exit documents argument as "integer" but actually requires "subtype of int"
Gareth Rees added the comment: > Wouldn't you also have to deal with possible errors from the PyInt_AsLong > call? Good point. But I note that Python 3 just does exitcode = (int)PyLong_AsLong(value); so maybe it's not important to do error handling here. -- ___ Python tracker <http://bugs.python.org/issue14376> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30919] Shared Array Memory Allocation Regression
Gareth Rees added the comment: In Python 2.7, multiprocessing.heap.Arena uses an anonymous memory mapping on Unix. Anonymous memory mappings can be shared between processes but only via fork(). But Python 3 supports other ways of starting subprocesses (see issue 8713 [1]) and so an anonymous memory mapping no longer works. So instead a temporary file is created, filled with zeros to the given size, and mapped into memory (see changeset 3b82e0d83bf9 [2]). It is the zero-filling of the temporary file that takes the time, because this forces the operating system to allocate space on the disk. But why not use ftruncate() (instead of write()) to quickly create a file with holes? POSIX says [3], "If the file size is increased, the extended area shall appear as if it were zero-filled" which would seem to satisfy the requirement. [1] https://bugs.python.org/issue8713 [2] https://hg.python.org/cpython/rev/3b82e0d83bf9 [3] http://pubs.opengroup.org/onlinepubs/9699919799/functions/ftruncate.html -- nosy: +g...@garethrees.org ___ Python tracker <http://bugs.python.org/issue30919> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30919] Shared Array Memory Allocation Regression
Gareth Rees added the comment: Note that some filesystems (e.g. HFS+) don't support sparse files, so creating a large Arena will still be slow on these filesystems even if the file is created using ftruncate(). (This could be fixed, for the "fork" start method only, by using anonymous maps in that case.) -- ___ Python tracker <http://bugs.python.org/issue30919> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30919] Shared Array Memory Allocation Regression
Gareth Rees added the comment: If you need the 2.7 behaviour (anonymous mappings) in 3.5 then you can still do it, with some effort. I think the approach that requires the smallest amount of work would be to ensure that subprocesses are started using fork(), by calling multiprocessing.set_start_method('fork'), and then monkey-patch multiprocessing.heap.Arena.__init__ so that it creates anonymous mappings using mmap.mmap(-1, size). (I suggested above that Python could be modified to create anonymous mappings in the 'fork' case, but now that I look at the code in detail, I see that it would be tricky, because the Arena class has no idea about the Context in which it is going to be used -- at the moment you can create one shared object and then pass it to subprocesses under different Contexts, so the shared objects have to support the lowest common denominator.) -- ___ Python tracker <http://bugs.python.org/issue30919> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30919] Shared Array Memory Allocation Regression
Gareth Rees added the comment: Nonetheless this is bound to be a nasty performance for many people doing big data processing with NumPy/SciPy/Pandas and multiprocessing and moving from 2 to 3, so even if it can't be fixed, the documentation ought to warn about the problem and explain how to work around it. -- ___ Python tracker <http://bugs.python.org/issue30919> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30919] Shared Array Memory Allocation Regression
Gareth Rees added the comment: I see now that the default start method is 'fork' (except on Windows), so calling set_start_method is unnecessary. Note that you don't have to edit multiprocessing/heap.py, you can "monkey-patch" it in the program that needs the anonymous mapping: from multiprocessing.heap import Arena def anonymous_arena_init(self, size, fd=-1): "Create Arena using an anonymous memory mapping." self.size = size self.fd = fd # still kept but is not used ! self.buffer = mmap.mmap(-1, self.size) Arena.__init__ = anonymous_arena_init As for what it will break — any code that uses the 'spawn' or 'forkserver' start methods. -- ___ Python tracker <http://bugs.python.org/issue30919> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30919] Shared Array Memory Allocation Regression
Gareth Rees added the comment: I propose: 1. Ask Richard Oudkerk why in changeset 3b82e0d83bf9 the temporary file is zero-filled and not truncated. Perhaps there's some file system where this is necessary? (I tested HFS+ which doesn't support sparse files, and zero-filling seems not to be necessary, but maybe there's some other file system where it is?) 2. If there's no good reason for zero-filling the temporary file, replace it with a call to os.ftruncate(fd, size). 3. Update the documentation to mention the performance issue when porting multiprocessing code from 2 to 3. Unfortunately, I don't think there's any advice that the documentation can give that will help work around it -- monkey-patching works but is not supported. 4. Consider writing a fix, or at least a supported workaround. Here's a suggestion: update multiprocessing.sharedctypes and multiprocessing.heap so that they use anonymous maps in the 'fork' context. The idea is to update the RawArray and RawValue functions so that they take the context, and then pass the context down to _new_value, BufferWrapper.__init__ and thence to Heap.malloc where it can be used to determine what kind of Arena (file-backed or anonymous) should be used to satisfy the allocation request. The Heap class would have to have to segregate its blocks according to what kind of Arena they come from. -- ___ Python tracker <http://bugs.python.org/issue30919> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19896] Exposing "q" and "Q" to multiprocessing.sharedctypes
Gareth Rees added the comment: Patch looks good to me. The test cases are not very systematic (why only int, double, and long long?), but that's not the fault of the patch and shouldn't prevent its being applied. -- nosy: +g...@garethrees.org ___ Python tracker <http://bugs.python.org/issue19896> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30943] printf-style Bytes Formatting sometimes do not worked.
Gareth Rees added the comment: Test case minimization: Python 3.6.1 (default, Apr 24 2017, 06:18:27) [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> b'a\x00%(a)s' % {b'a': b'a'} b'a\x00%(a)s' It seems that all formatting operations after a zero byte are ignored. This is because the code for parsing the format string (in _PyBytes_FormatEx in Objects/bytesobject.c) uses the following approach to find the next % character: while (--fmtcnt >= 0) { if (*fmt != '%') { Py_ssize_t len; char *pos; pos = strchr(fmt + 1, '%'); But strchr uses the C notion of strings, which are terminated by a zero byte. -- nosy: +g...@garethrees.org ___ Python tracker <http://bugs.python.org/issue30943> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30943] printf-style Bytes Formatting sometimes do not worked.
Gareth Rees added the comment: This was already noted in issue29714 and fixed by Xiang Zhang in commit b76ad5121e2. -- ___ Python tracker <http://bugs.python.org/issue30943> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19896] Exposing "q" and "Q" to multiprocessing.sharedctypes
Changes by Gareth Rees : -- nosy: +benjamin.peterson ___ Python tracker <http://bugs.python.org/issue19896> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19896] Exposing "q" and "Q" to multiprocessing.sharedctypes
Gareth Rees added the comment: Has Antony Lee has made a copyright assignment? -- ___ Python tracker <http://bugs.python.org/issue19896> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19896] Exposing "q" and "Q" to multiprocessing.sharedctypes
Gareth Rees added the comment: (If he hasn't, I don't think I can make a PR because I read his patch and so any implementation I make now is based on his patch and so potentially infringes his copyright.) -- ___ Python tracker <http://bugs.python.org/issue19896> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19896] Exposing "q" and "Q" to multiprocessing.sharedctypes
Changes by Gareth Rees : -- pull_requests: +2801 ___ Python tracker <http://bugs.python.org/issue19896> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30973] Regular expression "hangs" interpreter
Gareth Rees added the comment: This is the usual exponential backtracking behaviour of Python's regex engine. The problem is that the regex (?:[^*]+|\*[^/])* can match against a string in exponentially many ways, and Python's regex engine tries all of them before giving up. -- nosy: +g...@garethrees.org ___ Python tracker <http://bugs.python.org/issue30973> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30976] multiprocessing.Process.is_alive can show True for dead processes
Gareth Rees added the comment: This is a race condition — when os.kill returns, that means that the signal has been delivered, but it does not mean that the subprocess has exited yet. You can see this by inserting a sleep after the kill and before the liveness check: print(proc.is_alive()) os.kill(proc.pid, signal.SIGTERM) time.sleep(1) print(proc.is_alive()) This (probably) gives the process time to exit. (Presumably the psutil.pid_exists() call has a similar effect.) Of course, waiting for 1 second (or any amount of time) might not be enough. The right thing to do is to join the process. Then when the join exits you know it died. -- nosy: +g...@garethrees.org ___ Python tracker <http://bugs.python.org/issue30976> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24869] shlex lineno inaccurate with certain inputs
Changes by Gareth Rees : -- pull_requests: +2849 ___ Python tracker <http://bugs.python.org/issue24869> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24869] shlex lineno inaccurate with certain inputs
Gareth Rees added the comment: I've made a pull request. (Not because I expect it to be merged as-is, but to provide a starting point for discussion.) -- nosy: +petri.lehtinen, vinay.sajip ___ Python tracker <http://bugs.python.org/issue24869> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17005] Add a topological sort algorithm
Gareth Rees added the comment: I approve in general with the principle of including a topological sort algorithm in the standard library. However, I have three problems with the approach in PR 11583: 1. The name "topsort" is most naturally parsed as "top sort" which could be misinterpreted (as a sort that puts items on top in some way). If the name must be abbreviated then "toposort" would be better. 2. "Topological sort" is a terrible name: the analogy with topological graph theory is (i) unlikely to be helpful to anyone; and (ii) not quite right. I know that the name is widely used in computing, but a name incorporating "linearize" or "linear order" or "total order" would be much clearer. 3. The proposed interface is not suitable for all cases! The function topsort takes a list of directed edges and returns a linear order on the vertices in those edges (if any linear order exists). But this means that if there are any isolated vertices (that is, vertices with no edges) in the dependency graph, then there is no way of passing those vertices to the function. This means that (i) it is inconvenient to use the proposed interface because you have to find the isolated vertices in your graph and add them to the linear order after calling the function; (ii) it is a bug magnet because many programmers will omit this step, meaning that their code will unexpectedly fail when their graph has an isolated vertex. The interface needs to be redesigned to take the graph in some other representation. -- nosy: +g...@garethrees.org ___ Python tracker <https://bugs.python.org/issue17005> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17005] Add a topological sort algorithm
Gareth Rees added the comment: Just to elaborate on what I mean by "bug magnet". (I'm sure Pablo understands this, but there may be other readers who would like to see it spelled out.) Suppose that you have a directed graph represented as a mapping from a vertex to an iterable of its out-neighbours. Then the "obvious" way to get a total order on the vertices in the graph would be to generate the edges and pass them to topsort: def edges(graph): return ((v, w) for v, ww in graph.items() for w in ww) order = topsort(edges(graph)) This will appear to work fine if it is never tested with a graph that has isolated vertices (which would be an all too easy omission). To handle isolated vertices you have to remember to write something like this: reversed_graph = {v: [] for v in graph} for v, ww in graph.items(): for w in ww: reversed_graph[w].append(v) order = topsort(edges(graph)) + [ v for v, ww in graph.items() if not ww and not reversed_graph[v]] I think it likely that beginner programmers will forget to do this and be surprised later on when their total order is missing some of the vertices. -- ___ Python tracker <https://bugs.python.org/issue17005> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29977] re.sub stalls forever on an unmatched non-greedy case
Gareth Rees added the comment: The problem here is that both "." and "\s" match a whitespace character, and because you have the re.DOTALL flag turned on this includes "\n", and so the number of different ways in which (.|\s)* can be matched against a string is exponential in the number of whitespace characters in the string. It is best to design your regular expression so as to limit the number of different ways it can match. Here I recommend the expression: /\*(?:[^*]|\*[^/])*\*/ which can match in only one way. -- nosy: +g...@garethrees.org ___ Python tracker <http://bugs.python.org/issue29977> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29977] re.sub stalls forever on an unmatched non-greedy case
Gareth Rees added the comment: See also issue28690, issue212521, issue753711, issue1515829, etc. -- ___ Python tracker <http://bugs.python.org/issue29977> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30564] Base64 decoding gives incorrect outputs.
Gareth Rees added the comment: RFC 4648 section 3.5 says: The padding step in base 64 and base 32 encoding can, if improperly implemented, lead to non-significant alterations of the encoded data. For example, if the input is only one octet for a base 64 encoding, then all six bits of the first symbol are used, but only the first two bits of the next symbol are used. These pad bits MUST be set to zero by conforming encoders, which is described in the descriptions on padding below. If this property do not hold, there is no canonical representation of base-encoded data, and multiple base- encoded strings can be decoded to the same binary data. If this property (and others discussed in this document) holds, a canonical encoding is guaranteed. In some environments, the alteration is critical and therefore decoders MAY chose to reject an encoding if the pad bits have not been set to zero. If decoders may choose to reject non-canonical encodings, then they may also choose to accept them. (That's the meaning of "MAY" in RFC 2119.) So I think Python's behaviour is conforming to the standard. -- nosy: +g...@garethrees.org ___ Python tracker <http://bugs.python.org/issue30564> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28647] python --help: -u is misdocumented as binary mode
Gareth Rees added the comment: You're welcome. -- ___ Python tracker <https://bugs.python.org/issue28647> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31895] Native hijri calendar support
Gareth Rees added the comment: It is a substantial undertaking, requiring a great deal of expertise, to implement the Islamic calendar. The difficulty is that there are multiple versions of the calendar. In some places the calendar is based on human observation of the new moon, and so a database of past observations is needed (and future dates can't be represented). In other places the time of observability of the new moon is calculated according to an astronomical ephemeris (and different ephemerides are used in different places and at different times). -- nosy: +g...@garethrees.org ___ Python tracker <https://bugs.python.org/issue31895> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31895] Native hijri calendar support
Gareth Rees added the comment: convertdate does not document which version of the Islamic calendar it uses, but looking at the source code, it seems that it uses a rule-based calendar which has a 30-year cycle with 11 leap years. This won't help Haneef, who wants the Umm al-Qura calendar. -- ___ Python tracker <https://bugs.python.org/issue31895> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue32194] When creating list of dictionaries and updating datetime objects one by one, all values are set to last one of the list.
Gareth Rees added the comment: The behaviour of the * operator (and the associated gotcha) is documented under "Common sequence operations" [1]: Note that items in the sequence s are not copied; they are referenced multiple times. This often haunts new Python programmers ... There is also an entry in the FAQ [2]: replicating a list with * doesn’t create copies, it only creates references to the existing objects [1] https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range [2] https://docs.python.org/3/faq/programming.html#faq-multidimensional-list -- nosy: +g...@garethrees.org resolution: -> not a bug stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue32194> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20941] pytime.c:184 and pytime.c:218: runtime error, outside the range of representable values of type 'long'
Gareth Rees added the comment: > How did you get this warning? This looks like runtime output from a program built using Clang/LLVM with -fsanitize=undefined. See here: http://clang.llvm.org/docs/UsersManual.html#controlling-code-generation Signed integer overflow is undefined behaviour, so by the time *sec = (time_t)intpart has been evaluated, the undefined behaviour has already happened. It is too late to check for it afterwards. -- nosy: +Gareth.Rees ___ Python tracker <http://bugs.python.org/issue20941> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19362] Documentation for len() fails to mention that it works on sets
New submission from Gareth Rees: The help text for the len() built-in function says: Return the number of items of a sequence or mapping. This omits to mention that len() works on sets too. I suggest this be changed to: Return the number of items of a sequence, mapping, or set. Similarly, the documentation for len() says: The argument may be a sequence (string, tuple or list) or a mapping (dictionary). I suggest this be changed to The argument may be a sequence (string, tuple or list), a mapping (dictionary), or a set. (Of course, strictly speaking, len() accepts any object with a __len__ method, but sequences, mappings and sets are the ones that are built-in to the Python core, and so these are the ones it is important to mention in the help and the documentation.) -- assignee: docs@python components: Documentation files: len-set.patch keywords: patch messages: 201019 nosy: Gareth.Rees, docs@python priority: normal severity: normal status: open title: Documentation for len() fails to mention that it works on sets type: enhancement Added file: http://bugs.python.org/file32313/len-set.patch ___ Python tracker <http://bugs.python.org/issue19362> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19363] Python 2.7's future_builtins.map is not compatible with Python 3's map
New submission from Gareth Rees: In Python 2.7, future_builtins.map accepts None as its first (function) argument: Python 2.7.5 (default, Aug 1 2013, 01:01:17) >>> from future_builtins import map >>> list(map(None, range(3), 'ABC')) [(0, 'A'), (1, 'B'), (2, 'C')] But in Python 3.x, map does not accept None as its first argument: Python 3.3.2 (default, May 21 2013, 11:50:47) >>> list(map(None, range(3), 'ABC')) Traceback (most recent call last): File "", line 1, in TypeError: 'NoneType' object is not callable The documentation says, "if you want to write code compatible with Python 3 builtins, import them from this module," so this incompatibility may give Python 2.7 programmers the false impression that a program which uses map(None, ...) is portable to Python 3. I suggest that future_builtins.map in Python 2.7 should behave the same as map in Python 3: that is, it should raise a TypeError if None was passed as the first argument. -- components: Library (Lib) messages: 201020 nosy: Gareth.Rees priority: normal severity: normal status: open title: Python 2.7's future_builtins.map is not compatible with Python 3's map type: behavior versions: Python 2.7 ___ Python tracker <http://bugs.python.org/issue19363> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19362] Documentation for len() fails to mention that it works on sets
Gareth Rees added the comment: I considered suggesting "container", but the problem is that "container" is used elsewhere to mean "object supporting the 'in' operator" (in particular, collections.abc.Container has a __contains__ method but no __len__ method). The abstract base class for "object with a length" is collections.abc.Sized, but I don't think using the term "sized" would be clear to users. -- ___ Python tracker <http://bugs.python.org/issue19362> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20507] TypeError from str.join has no message
New submission from Gareth Rees: If you pass an object of the wrong type to str.join, Python raises a TypeError with no error message: Python 3.4.0b3 (default, Jan 27 2014, 02:26:41) [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> ''.join(1) Traceback (most recent call last): File "", line 1, in TypeError It is unnecessarily hard to understand from this error what the problem actually was. Which object had the wrong type? What type should it have been? Normally a TypeError is associated with a message explaining which type was wrong, and what it should have been. For example: >>> b''.join(1) Traceback (most recent call last): File "", line 1, in TypeError: can only join an iterable It would be nice if the TypeError from ''.join(1) included a message like this. The reason for the lack of message is that PyUnicode_Join starts out by calling PySequence_Fast(seq, "") which suppresses the error message from PyObject_GetIter. This commit by Tim Peters is responsible: <http://hg.python.org/cpython/rev/8579859f198c>. The commit message doesn't mention the suppression of the message so I can assume that it was an oversight. I suggest replacing the line: fseq = PySequence_Fast(seq, ""); in PyUnicode_Join in unicodeobject.c with: fseq = PySequence_Fast(seq, "can only join an iterable"); for consistency with bytes_join in stringlib/join.h. Patch attached. -- components: Interpreter Core files: join.patch keywords: patch messages: 210200 nosy: Gareth.Rees priority: normal severity: normal status: open title: TypeError from str.join has no message type: behavior versions: Python 3.4 Added file: http://bugs.python.org/file33900/join.patch ___ Python tracker <http://bugs.python.org/issue20507> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20508] IndexError from ipaddress._BaseNetwork.__getitem__ has no message
New submission from Gareth Rees: If you try to look up an out-of-range address from an object returned by ipaddress.ip_network, then ipaddress._BaseNetwork.__getitem__ raises an IndexError with no message: Python 3.4.0b3 (default, Jan 27 2014, 02:26:41) [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import ipaddress >>> ipaddress.ip_network('2001:db8::8/125')[100] Traceback (most recent call last): File "", line 1, in File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/ipaddress.py", line 601, in __getitem__ raise IndexError IndexError Normally an IndexError is associated with a message explaining the cause of the error. For example: >>> [].pop() Traceback (most recent call last): File "", line 1, in IndexError: pop from empty list It would be nice if the IndexError from ipaddress._BaseNetwork.__getitem__ included a message like this. With the attached patch, the error message looks like this in the positive case: >>> ipaddress.ip_network('2001:db8::8/125')[100] Traceback (most recent call last): File "", line 1, in File "/Users/gdr/hg.python.org/cpython/Lib/ipaddress.py", line 602, in __getitem__ % (self, self.num_addresses)) IndexError: 100 out of range 0..7 for 2001:db8::8/125 and like this in the negative case: >>> ipaddress.ip_network('2001:db8::8/125')[-100] Traceback (most recent call last): File "", line 1, in File "/Users/gdr/hg.python.org/cpython/Lib/ipaddress.py", line 608, in __getitem__ % (n - 1, self.num_addresses, self)) IndexError: -100 out of range -8..-1 for 2001:db8::8/125 (If you have a better suggestion for how the error message should read, I could submit a revised patch. I suppose it could just say "address index out of range" for consistency with list.__getitem__ and str.__getitem__. But I think the extra information is likely to be helpful for the programmer who is trying to track down the cause of an error.) -- components: Library (Lib) files: ipaddress.patch keywords: patch messages: 210224 nosy: Gareth.Rees priority: normal severity: normal status: open title: IndexError from ipaddress._BaseNetwork.__getitem__ has no message type: behavior versions: Python 3.4 Added file: http://bugs.python.org/file33903/ipaddress.patch ___ Python tracker <http://bugs.python.org/issue20508> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20508] IndexError from ipaddress._BaseNetwork.__getitem__ has no message
Changes by Gareth Rees : -- type: behavior -> enhancement ___ Python tracker <http://bugs.python.org/issue20508> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19362] Documentation for len() fails to mention that it works on sets
Gareth Rees added the comment: Here's a revised patch using Ezio's suggestion ("Return the number of items of a sequence or container"). -- Added file: http://bugs.python.org/file33904/len-set.patch ___ Python tracker <http://bugs.python.org/issue19362> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19362] Documentation for len() fails to mention that it works on sets
Changes by Gareth Rees : -- title: Documentation for len() fails to mention that it works on sets -> Documentation for len() fails to mention that it works on sets versions: +Python 3.4 ___ Python tracker <http://bugs.python.org/issue19362> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14376] sys.exit documents argument as "integer" but actually requires "subtype of int"
Gareth Rees added the comment: Patch attached. I added a test case to Lib/test/test_sys.py. -- Added file: http://bugs.python.org/file33906/exit.patch ___ Python tracker <http://bugs.python.org/issue14376> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20510] Test cases in test_sys don't match the comments
New submission from Gareth Rees: Lib/test/test_sys.py contains test cases with incorrect comments -- or comments with incorrect test cases, if you prefer: # call without argument try: sys.exit(0) except SystemExit as exc: self.assertEqual(exc.code, 0) ... # call with tuple argument with one entry # entry will be unpacked try: sys.exit(42) except SystemExit as exc: self.assertEqual(exc.code, 42) ... # call with integer argument try: sys.exit((42,)) except SystemExit as exc: self.assertEqual(exc.code, 42) ... (In the quote above I've edited out some inessential detail; see the file if you really want to know.) You can see that in the first test case sys.exit is called with an argument (although the comment claims otherwise); in the second it is called with an integer (not a tuple), and in the third it is called with a tuple (not an integer). These comments have been unchanged since the original commit by Walter Dörwald <http://hg.python.org/cpython/rev/6a1394660270>. I've attached a patch that corrects the first test case and swaps the comments for the second and third test cases: # call without argument rc = subprocess.call([sys.executable, "-c", "import sys; sys.exit()"]) self.assertEqual(rc, 0) # call with integer argument try: sys.exit(42) except SystemExit as exc: self.assertEqual(exc.code, 42) ... # call with tuple argument with one entry # entry will be unpacked try: sys.exit((42,)) except SystemExit as exc: self.assertEqual(exc.code, 42) ... Note that in the first test case (without an argument) sys.exit() with no argument actually raises SystemExit(None), so it's not sufficient to catch the SystemExit and check exc.code; I need to check that it actually gets translated to 0 on exit. -- components: Tests files: exittest.patch keywords: patch messages: 210246 nosy: Gareth.Rees priority: normal severity: normal status: open title: Test cases in test_sys don't match the comments type: enhancement versions: Python 3.4 Added file: http://bugs.python.org/file33908/exittest.patch ___ Python tracker <http://bugs.python.org/issue20510> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19363] Python 2.7's future_builtins.map is not compatible with Python 3's map
Gareth Rees added the comment: What about a documentation change instead? The future_builtins chapter <http://docs.python.org/2/library/future_builtins.html> in the standard library documentation could note the incompatibility. I've attached a patch which adds the following note to the documentation for future_builtins.map: Note: In Python 3, map() does not accept None for the function argument. (zip() can be used instead.) -- status: closed -> open ___ Python tracker <http://bugs.python.org/issue19363> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20510] Test cases in test_sys don't match the comments
Gareth Rees added the comment: I normally try not to make changes "while we're in here" for fear of introducing errors! But I guess the test cases are less critical, so I've taken your review comments as a license to submit a revised patch that: * incorporates your suggestion to use assert_python_ok from test.script_helper, instead of subprocess.call; * replaces the other uses of subprocess.call with assert_python_failure and adds a check on stdout; * cleans up the assertion-testing code using the context manager form of unittest.TestCase.assertRaises. I've signed and submitted a contributor agreement as requested. -- Added file: http://bugs.python.org/file33914/exittest-1.patch ___ Python tracker <http://bugs.python.org/issue20510> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19362] Documentation for len() fails to mention that it works on sets
Gareth Rees added the comment: Here's a revised patch for Terry ("Return the number of items of a sequence or collection.") -- Added file: http://bugs.python.org/file33916/len-set.patch ___ Python tracker <http://bugs.python.org/issue19362> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12691] tokenize.untokenize is broken
Gareth Rees added the comment: Yury, let me see if I can move this issue forward. I clearly haven't done a good job of explaining these problems, how they are related, and why it makes sense to solve them together, so let me have a go now. 1. tokenize.untokenize() raises AssertionError if you pass it a sequence of tokens output from tokenize.tokenize(). This was my original problem report, and it's still not fixed in Python 3.4: Python 3.4.0b3 (default, Jan 27 2014, 02:26:41) [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import tokenize, io >>> t = list(tokenize.tokenize(io.BytesIO('1+1'.encode('utf8')).readline)) >>> tokenize.untokenize(t) Traceback (most recent call last): File "", line 1, in File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/tokenize.py", line 317, in untokenize out = ut.untokenize(iterable) File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/tokenize.py", line 246, in untokenize self.add_whitespace(start) File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/tokenize.py", line 232, in add_whitespace assert row <= self.prev_row AssertionError This defeats any attempt to use the sequence: input code -> tokenize -> transform -> untokenize -> output code to transform Python code. But this ought to be the main use case for the untokenize function! That's how I came across the problem in the first place, when I was starting to write Minipy <https://github.com/gareth-rees/minipy>. 2. Fixing problem #1 is easy (just swap <= for >=), but it raises the question: why wasn't this mistake caught by test_tokenize? There's a test function roundtrip() whose docstring says: Test roundtrip for `untokenize`. `f` is an open file or a string. The source code in f is tokenized, converted back to source code via tokenize.untokenize(), and tokenized again from the latter. The test fails if the second tokenization doesn't match the first. If I don't fix the problem with roundtrip(), then how can I be sure I have fixed the problem? Clearly it's necessary to fix the test case and establish that it provokes the assertion. So why doesn't roundtrip() detect the error? Well, it turns out that tokenize.untokenize() has two modes of operation and roundtrip() only tests one of them. The documentation for tokenize.untokenize() is rather cryptic, and all it says is: Each element returned by the [input] iterable must be a token sequence with at least two elements, a token number and token value. If only two tokens are passed, the resulting output is poor. By reverse-engineering the implementation, it seems that it has two modes of operation. In the first mode (which I have called "compatibility" mode after the method Untokenizer.compat() that implements it) you pass it tokens in the form of 2-element tuples (type, text). These must have exactly 2 elements. In the second mode (which I have called "full" mode based on the description "full input" in the docstring) you pass it tokens in the form of tuples with 5 elements (type, text, start, end, line). These are compatible with the namedtuples returned from tokenize.tokenize(). The "full" mode has the buggy assertion, but test_tokenize.roundtrip() only tests the "compatibility" mode. So I must (i) fix roundtrip() so that it tests both modes; (ii) improve the documentation for tokenize.untokenize() so that programmers have some chance of figuring this out in future! 3. As soon as I make roundtrip() test both modes it provokes the assertion failure. Good, so I can fix the assertion. Problem #1 solved. But now there are test failures in "full" mode: $ ./python.exe -m test test_tokenize [1/1] test_tokenize ** File "/Users/gdr/hg.python.org/cpython/Lib/test/test_tokenize.py", line ?, in test.test_tokenize.__test__.doctests Failed example: for testfile in testfiles: if not roundtrip(open(testfile, 'rb')): print("Roundtrip failed for file %s" % testfile) break else: True Expected: True Got: Roundtrip failed for file /Users/gdr/hg.python.org/cpython/Lib/test/test_platform.py *
[issue12691] tokenize.untokenize is broken
Changes by Gareth Rees : -- nosy: +benjamin.peterson ___ Python tracker <http://bugs.python.org/issue12691> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12691] tokenize.untokenize is broken
Gareth Rees added the comment: This morning I noticed that I had forgotten to update the library reference, and I also noticed two more problems to add to the list above: 6. Although Lib/test/test_tokenize.py looks like it contains tests for backslash-newline handling, these tests are ineffective. Here they are: >>> roundtrip("x=1+n" ... "1\\n" ... "# This is a commentn" ... "# This also\\n") True >>> roundtrip("# Comment nx = 0") True There are two problems here: (i) because of the double string escaping, these are not backslash-newline, they are backslash-n. (ii) the roundtrip() test is too weak to detect this problem: tokenize() outputs an ERRORTOKEN for the backslash and untokenize() restores it. So the round-trip property is satisfied. 7. Problem 6 shows the difficulty of using doctests for this kind of test. It would be easier to ensure the correctness of these tests if the docstring was read from a separate file, so that at least the tests only need one level of string escaping. I fixed problem 6 by updating these tests to use dump_tokens() instead of roundtrip(). I have not fixed problem 7 (like 4 and 5, I can leave it for another issue). Revised patch attached. -- Added file: http://bugs.python.org/file33924/Issue12691.patch ___ Python tracker <http://bugs.python.org/issue12691> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12691] tokenize.untokenize is broken
Changes by Gareth Rees : -- assignee: -> docs@python components: +Documentation, Tests nosy: +docs@python ___ Python tracker <http://bugs.python.org/issue12691> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12691] tokenize.untokenize is broken
Changes by Gareth Rees : Removed file: http://bugs.python.org/file33919/Issue12691.patch ___ Python tracker <http://bugs.python.org/issue12691> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12691] tokenize.untokenize is broken
Gareth Rees added the comment: I did some research on the cause of this issue. The assertion was added in this change by Jeremy Hylton in August 2006: <https://mail.python.org/pipermail/python-checkins/2006-August/055812.html> (The corresponding Mercurial commit is here: <http://hg.python.org/cpython/rev/cc992d75d5b3#l217.25>). At that point I believe the assertion was reasonable. I think it would have been triggered by backslash-continued lines, but otherwise it worked. But in this change <http://hg.python.org/cpython/rev/51e24512e305> in March 2008 Trent Nelson applied this patch by Michael Foord <http://bugs.python.org/file9741/tokenize_patch.diff> to implement PEP 263 and fix issue719888. The patch added ENCODING tokens to the output of tokenize.tokenize(). The ENCODING token is always generated with row number 0, while the first actual token is generated with row number 1. So now every token stream from tokenize.tokenize() sets off the assertion. The lack of a test case for tokenize.untokenize() in "full" mode meant that it was (and is) all too easy for someone to accidentally break it like this. -- ___ Python tracker <http://bugs.python.org/issue12691> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20539] math.factorial may throw OverflowError
Gareth Rees added the comment: It's not a case of internal storage overflowing. The error is from Modules/mathmodule.c:1426 and it's the input 10**19 that's too large to convert to a C long. You get the same kind of error in other places where PyLong_AsLong or PyLong_AsInt is called on a user-supplied value, for example: >>> import pickle >>> pickle.dumps(10**19, 10**19) Traceback (most recent call last): File "", line 1, in OverflowError: Python int too large to convert to C long -- nosy: +Gareth.Rees ___ Python tracker <http://bugs.python.org/issue20539> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20606] Operator Documentation Example doesn't work
Gareth Rees added the comment: The failing example is: d = {} keys = range(256) vals = map(chr, keys) map(operator.setitem, [d]*len(keys), keys, vals) which works in Python 2 where map returns a list, but not in Python 3 where map returns an iterator. Doc/library/operator.rst follows the example with this note: .. XXX: find a better, readable, example Additional problems with the example: 1. It's poorly motivated because a dictionary comprehension would be simpler and shorter: d = {i: chr(i) for i in range(256)} 2. It's also unclear why you'd need this dictionary when you could just call the function chr (but I suppose some interface might require a dictionary rather than a function). 3. To force the map to be evaluated, you need to write list(map(...)) which allocates an unnecessary list object and then throws it away. To avoid the unnecessary allocation you could use the "consume" recipe from the itertools documentation and write collections.deque(map(...), maxlen=0) but this is surely too obscure to use as an example. I had a look through the Python sources, and made an Ohloh Code search for "operator.setitem" and I didn't find any good examples of its use, so I think the best thing to do is just to delete the example. <http://code.ohloh.net/search?s=%22operator.setitem%22&pp=0&fl=Python&mp=1&ml=1&me=1&md=1&ff=1&filterChecked=true> -- nosy: +Gareth.Rees ___ Python tracker <http://bugs.python.org/issue20606> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20606] Operator Documentation Example doesn't work
Changes by Gareth Rees : -- keywords: +patch Added file: http://bugs.python.org/file34059/operator.patch ___ Python tracker <http://bugs.python.org/issue20606> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19363] Python 2.7's future_builtins.map is not compatible with Python 3's map
Gareth Rees added the comment: Sorry about that; here it is. I had second thoughts about recommending zip() as an alternative (that would only work for cases where the None was constant; in other cases you might need lambda *args: args, but this seemed too complicated), so the note now says only: Note: In Python 3, map() does not accept None for the function argument. -- keywords: +patch Added file: http://bugs.python.org/file34117/issue19363.patch ___ Python tracker <http://bugs.python.org/issue19363> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12691] tokenize.untokenize is broken
Gareth Rees added the comment: Thanks for your work on this, Terry. I apologise for the complexity of my original report, and will try not to do it again. -- ___ Python tracker <http://bugs.python.org/issue12691> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20727] Improved roundrobin itertools recipe
Gareth Rees added the comment: > benchmarks show it to be more than twice as fast I'm sure they do, but other benchmarks show it to be more than twice as slow. Try something like: iterables = [range(100)] + [()] * 100 -- nosy: +Gareth.Rees ___ Python tracker <http://bugs.python.org/issue20727> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20727] Improved roundrobin itertools recipe
Gareth Rees added the comment: If 100 doesn't work for you, try a larger number. -- ___ Python tracker <http://bugs.python.org/issue20727> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20727] Improved roundrobin itertools recipe
Gareth Rees added the comment: I suspect I messed up the timing I did yesterday, because today I find that 100 isn't large enough, but here's what I found today (in Python 3.3): >>> from timeit import timeit >>> test = [tuple(range(300))] + [()] * 100 >>> timeit(lambda:list(roundrobin1(*test)), number=1) # old recipe 8.386148632998811 >>> timeit(lambda:list(roundrobin2(*test)), number=1) # new recipe 16.757110453007044 The new recipe is more than twice as slow as the old in this case, and its performance gets relatively worse as you increase the number 300. I should add that I do recognise that the new recipe is better for nearly all cases (it's simpler as well as faster), but I want to point out an important feature of the old recipe, namely that it discards iterables as they are finished with, giving it worst-case O(n) performance (albeit slow) whereas the new recipe has worst case O(n^2). As we found out with hash tables, worst-case O(n^2) performance can be a problem when inputs are untrusted, so there are use cases where people might legitimately prefer an O(n) solution even if it's a bit slower in common cases. -- ___ Python tracker <http://bugs.python.org/issue20727> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20727] Improved roundrobin itertools recipe
Gareth Rees added the comment: But now that I look at the code more carefully, the old recipe also has O(n^2) behaviour, because cycle(islice(nexts, pending)) costs O(n) and is called O(n) times. To have worst-case O(n) behaviour, you'd need something like this: from collections import deque def roundrobin3(*iterables): "roundrobin('ABC', 'D', 'EF') --> A D E B F C" nexts = deque(iter(it).__next__ for it in iterables) while nexts: try: while True: yield nexts[0]() nexts.rotate(-1) except StopIteration: nexts.popleft() >>> from timeit import timeit >>> test = [tuple(range(1000))] + [()] * 1000 >>> timeit(lambda:list(roundrobin1(*test)), number=100) # old recipe 5.184364624001319 >>> timeit(lambda:list(roundrobin2(*test)), number=100) # new recipe 5.139592286024708 >>> timeit(lambda:list(roundrobin3(*test)), number=100) 0.16217014100402594 -- ___ Python tracker <http://bugs.python.org/issue20727> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20774] collections.deque should ship with a stdlib json serializer
Gareth Rees added the comment: The JSON implementation uses these tests to determine how to serialize a Python object: isinstance(o, (list, tuple)) isinstance(o, dict) So any subclasses of list and tuple are serialized as a list, and any subclass of dict is serialized as an object. For example: >>> json.dumps(collections.defaultdict()) '{}' >>> json.dumps(collections.OrderedDict()) '{}' >>> json.dumps(collections.namedtuple('mytuple', ())()) '[]' When deserialized, you'll get back a plain dictionary or list, so there's no round-trip property here. The tests could perhaps be changed to: isinstance(o, collections.abc.Sequence) isinstance(o, collections.abc.Mapping) I'm not a JSON expert, so I have no informed opinion on whether this is a good idea or not, but in any case, this change wouldn't help with deques, as a deque is not a Sequence. That's because deques don't have an index method (see issue10059 and issue12543). -- nosy: +Gareth.Rees ___ Python tracker <http://bugs.python.org/issue20774> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20905] Adapt heapq push/pop/replace to allow passing a comparator.
Gareth Rees added the comment: It would be better to accept a key function instead of a comparison function (cf. heapq.nlargest and heapq.nsmallest). But note that this has been proposed before and rejected: see issue1904 where Raymond Hettinger provides this rationale: Use cases aside, there is another design issue in that the key-function approach doesn't work well with the heap functions on regular lists. Successive calls to heap functions will of necessity call the key- function multiple times for any given element. This contrasts with sort () where the whole purpose of the key function was to encapsulate the decorate-sort-undecorate pattern which was desirable because the key- function called exactly once per element. However, in the case of the bisect module (where requests for a key function are also common), Guido was recently persuaded that there was a valid use case. See issue4356, and this thread on the Python-ideas mailing list: <https://mail.python.org/pipermail/python-ideas/2012-February/thread.html#13650> where Arnaud Delobelle points out that: Also, in Python 3 one can't assume that values will be comparable so the (key, value) tuple trick won't work: comparing the tuples may well throw a TypeError. and Guido responds: Bingo. That clinches it. We need to add key=. -- nosy: +Gareth.Rees ___ Python tracker <http://bugs.python.org/issue20905> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14376] sys.exit documents argument as "integer" but actually requires "subtype of int"
Gareth Rees added the comment: Is there any chance of making progress on this issue? Is there anything wrong with my patch? Did I omit any relevant point in my message of 2016-06-11 16:26? It would be nice if this were not left in limbo for another four years. -- ___ Python tracker <http://bugs.python.org/issue14376> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14376] sys.exit documents argument as "integer" but actually requires "subtype of int"
Gareth Rees added the comment: In Windows, under cmd.exe, you can use %errorlevel% -- ___ Python tracker <http://bugs.python.org/issue14376> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24869] shlex lineno inaccurate with certain inputs
Gareth Rees added the comment: Here's a patch that implements my proposal (1) -- under this patch, tokens read from an input stream belong to a subtype of str with startline and endline attributes giving the line numbers of the first and last character of the token. This allows the accurate reporting of error messages relating to a token. I updated the documentation and added a test case. -- keywords: +patch Added file: http://bugs.python.org/file46479/issue24869.patch ___ Python tracker <http://bugs.python.org/issue24869> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14376] sys.exit documents argument as "integer" but actually requires "subtype of int"
Gareth Rees added the comment: Thanks for the revised patch, Mark. The new tests look good. -- ___ Python tracker <http://bugs.python.org/issue14376> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14376] sys.exit documents argument as "integer" but actually requires "subtype of int"
Gareth Rees added the comment: Thank you, Mark (and everyone else who helped). -- ___ Python tracker <http://bugs.python.org/issue14376> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20508] IndexError from ipaddress._BaseNetwork.__getitem__ has no message
Gareth Rees added the comment: I've attached a revised patch that addresses Berker Peksag's concerns: 1. The message associated with the IndexError is now "address out of range" with no information about which address failed or why. 2. There's a new test case for an IndexError from an IPv6 address lookup. -- Added file: http://bugs.python.org/file43341/ipaddress.patch ___ Python tracker <http://bugs.python.org/issue20508> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14376] sys.exit documents argument as "integer" but actually requires "subtype of int"
Gareth Rees added the comment: Let's not allow the perfect to be the enemy of the good here. The issue I reported is a very specific one: in Python 2.7, if you pass a long to sys.exit, then the value of the long is not used as the exit code. This is bad because functions like os.spawnv that return exit codes (that you might reasonably want to pass on to sys.exit) can return them as long. My patch only proposes to address this one issue. In order to keep the impact as small as possible, I do not propose to make any other changes, or address any other problems. But in the comments here people have brought up THREE other issues: 1. Alexander Belopolsky expresses the concern that "(int)PyLong_AsLong(value) can silently convert non-zero error code to zero." This is not a problem introduced by my patch -- the current code is: exitcode = (int)PyInt_AsLong(value) which has exactly the same problem (because PyIntObject stores its value as a long). So this concern (even if valid) is not a reason to reject my patch. 2. Ethan Furman wrote: "we need to protect against overflow from to " But again, this is not a problem introduced by my patch. The current code says: exitcode = (int)PyInt_AsLong(value); and my patch does not change this line. The possibility of this overflow is not a reason to reject my patch. 3. Alexander says, "Passing anything other than one of the os.EX_* constants to sys.exit() is a bad idea" First, this is not a problem introduced by my patch. The existing code in Python 2.7 allows you to specify other exit codes. So this problem (if it is a problem) is not a reason to reject my patch. Second, this claim is surely not right -- when a subprocess fails it often makes sense to pass on the exit code of the subprocess, whatever that is. This is exactly the use case that I mentioned in my original report (that is, passing on the exit code from os.spawnv to sys.exit). -- ___ Python tracker <http://bugs.python.org/issue14376> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20508] IndexError from ipaddress._BaseNetwork.__getitem__ has no message
Gareth Rees added the comment: Thank you for applying this patch. -- ___ Python tracker <http://bugs.python.org/issue20508> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27306] Grammatical Error in Documentation - Tarfile page
Gareth Rees added the comment: Here's a patch improving the grammar in the tarfile documentation. -- keywords: +patch nosy: +Gareth.Rees Added file: http://bugs.python.org/file43375/issue27306.patch ___ Python tracker <http://bugs.python.org/issue27306> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24869] shlex lineno inaccurate with certain inputs
Gareth Rees added the comment: Just to restate the problem: The use case is that when emitting an error message for a token, we want to include the number of the line containing the token (or the number of the line where the token started, if the token spans multiple lines, as it might if it's a string containing newlines). But there is no way to satisfy this use case given the features of the shlex module. In particular, shlex.lineno (which looks as if it ought to help) is actually the line number of the first character that has not yet been consumed by the lexer, and in general this is not the same as the line number of the previous (or the next) token. I can think of two alternatives that would satisfy the use case: 1. Instead of returning tokens as str objects, return them as instances of a subclass of str that has a property that gives the line number of the first character of the token. (Maybe it should also have properties for the column number of the first character, and the line and column number of the last character too? These properties would support better error messages.) 2. Add new methods that return tuples giving the token and its line number (and possibly column number etc. as in alternative 1). My preference would be for alternative (1), but I suppose there is a very tiny risk of breaking some code that relied upon get_token returning an instance of str exactly rather than an instance of a subclass of str. -- nosy: +Gareth.Rees ___ Python tracker <http://bugs.python.org/issue24869> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24869] shlex lineno inaccurate with certain inputs
Gareth Rees added the comment: A third alternative: 3. Add a method whose effect is to consume comments and whitespace, but which does not yield a token. You could then call this method, and then look at shlex.lineno, which will be the line number of the first character of the next token (if there is a next token). -- ___ Python tracker <http://bugs.python.org/issue24869> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27588] Type objects are hashable and comparable for equality but this is not documented
New submission from Gareth Rees: The type objects constructed by the metaclasses in the typing module are hashable and comparable for equality: >>> from typing import * >>> {Mapping[str, int], Mapping[int, str]} {typing.Mapping[int, str], typing.Mapping[str, int]} >>> Union[str, int, float] == Union[float, int, str] True >>> List[int] == List[float] False but this is not clearly documented in the documentation for the typing module (there are a handful of examples using equality, but it's not explicit that these are runnable). It would be nice if there were explicit documentation for these properties of type objects. -- assignee: docs@python components: Documentation messages: 270981 nosy: Gareth.Rees, docs@python priority: normal severity: normal status: open title: Type objects are hashable and comparable for equality but this is not documented type: enhancement versions: Python 3.5, Python 3.6 ___ Python tracker <http://bugs.python.org/issue27588> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com