[issue39165] Completeness and symmetry in RE, avoid `findall(...)[0]`
New submission from Juancarlo Añez : The problematic `findall(...)[0]` is a common anti-pattern in Python programs. The reason is lack of symmetry and completeness in the `re` module. The original proposal in `python-ideas` was to add `re.findfirst(pattern, string, flags=0, default=_mark)` with more or less the semantics of `next(findall(pattern, string, flags=flags), default=default)`. The referenced PR adds `findalliter(pattern, string, flags=0)` with the value semantics of `findall()` over a generator, implements `findall()` as `return list(findalliter(...))`, and implements `findfirst()`. Consistency and correctness are likely because all tests pass with the redefined `findall()`. -- components: Library (Lib) messages: 359039 nosy: apalala priority: normal pull_requests: 17191 severity: normal status: open title: Completeness and symmetry in RE, avoid `findall(...)[0]` type: enhancement versions: Python 3.8 ___ Python tracker <https://bugs.python.org/issue39165> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39165] Completeness and symmetry in RE, avoid `findall(...)[0]`
Juancarlo Añez added the comment: The discussion on python-ideas favored the inclusion of `findfirst()`. At any rate, not having a generator version of `findall()` is an important omission. Another user's search of Github public repositories found that `findall(...)[0]` is prevalent. python-ideas agreed that the cause was the incompleteness/asymmetry in `re`. -- ___ Python tracker <https://bugs.python.org/issue39165> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39165] Completeness and symmetry in RE, avoid `findall(...)[0]`
Juancarlo Añez added the comment: There's no way to assert that `findall(...)[0]` is efficient enough in most cases. It is easy to see that that it is risky in every case, as runtime may be exponential, and memory O(len(input)). A mistake in the regular expression may easily result in an out-of-memory, which can only be debugged with a series of tests using `search()`. A problem with `re.search(...)` is that id doesn't have the return value semantics of `findall(...)[0]`, and those semantics seem to be what appeal to Python programmers. It takes several lines of code (the ones in `findalliter()`) to have the same result as `findal(...)[0]` when using `search()`. `findall()` is the sole, lonely function in `re` with its return-value semantics. Also this proposal embeds `first()` within the body of `findfirst(...)`, but by the implementation one should consider if `first()` shouldn't be part of `itertools`, perhaps with a different name, like `take_one()`. One should also consider that although third-party extensions to `itertools` already provide the equivalent of `first()`, `findalliter()` and `findfirst()` do not belong there, and there are no mainstream third-party extensions to `re` where they would fit. -- ___ Python tracker <https://bugs.python.org/issue39165> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39165] Completeness and symmetry in RE, avoid `findall(...)[0]`
Juancarlo Añez added the comment: The bottom problem, as I see it, is that, historically, `re.search()` returns `None` when there is no match, instead of returning a `Match` object that is consistent with "no match" (evaluates to `False`, etc.) The above seems too difficult to repair as so much existing code relies on those semantics (`if match is None` is the risky bit). Hence, `findall()`, `findalliter()`, and `findfirst()`. -- ___ Python tracker <https://bugs.python.org/issue39165> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39165] Completeness and symmetry in RE, avoid `findall(...)[0]`
Juancarlo Añez added the comment: The analysis done by Terry bypasses the fact that `search(...)` returns `None` when there is no match, so indexing or calling methods in its result is not safe code. `findall()` returns an empty list when there is no match. `findalliter()` returns an empty iterator when there is no match. `findfirst()` may return a `default` value when there is no match. If `search()` is proposed to replace `findall()[0]`, then the idiom has to be (depending on the case): m[0] if (m := re.search(...)) else '0' m.groups() if (m := re.search(...)) else '0' In contrast, `findfirst()` returns a value that is the same as `findall()` when there is a match, or a `default` if there is no match. m[0] if (m := re.findall(...)) else '0' Compare with: findfirst(..., default='0') -- ___ Python tracker <https://bugs.python.org/issue39165> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17343] Add a version of str.split which returns an iterator
Juancarlo Añez added the comment: def isplit(text, sep=None, maxsplit=-1): """ A lowmemory-footprint version of: iter(text.split(sep, maxsplit)) Adapted from https://stackoverflow.com/a/9770397 """ if maxsplit == 0: yield text else: rsep = re.escape(sep) if sep else r'\s+' regex = fr'(?:^|{rsep})((?:(?!{rsep}).)*)' for n, p in enumerate(re.finditer(regex, text)): if 0 <= maxsplit <= n: yield p.string[p.start(1):] return yield p.group(1) -- nosy: +apalala ___ Python tracker <https://bugs.python.org/issue17343> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14902] test_logging failed
Juancarlo Añez added the comment: My local timezone is (VET,VET) == time.tzname, and test_logging fails because time.timezone is off by 30 minutes. I couldn't find the cause for the problem with time.timezone, but logging is not to blame. I'm running the tests on Ubuntu 12.04 AMD64 which handles my time zone correctly throughout. I'm submitting a patch that allows test_logging to pass by not relying on time.timezone. -- keywords: +patch nosy: +apalala Added file: http://bugs.python.org/file26224/test_logging_wo_timezone.patch ___ Python tracker <http://bugs.python.org/issue14902> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14902] test_logging failed
Juancarlo Añez added the comment: @Vinay The test *is* broken in theory, because it uses today's time.timezone to make calculations over a datetime in the past (1993), even when official time zones have changes in recent years for Caracas, Moscow, and others: http://www.timeanddate.com/news/time/. As it is, the test will pass on some locations and fail on others, even if time.timezone is correct. If time.timezone is wrong for certain locations is a separate issue that I will post as soon as I complete the unit test. I took a look at Modules/timemodule.c,and there seems to be nothing wrong there. In short, the bug is: test_time() incorrectly uses the current time.timezone to make calculations over dates in the past. -- ___ Python tracker <http://bugs.python.org/issue14902> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14902] test_logging failed
Juancarlo Añez added the comment: > And datetime.datetime.now().tzinfo is always None. I can reproduce that. -- ___ Python tracker <http://bugs.python.org/issue14902> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14902] test_logging failed
Juancarlo Añez added the comment: I did extensive testing on time.timezone, and it is correct as far as the current date is concerned. The problem, as mentioned before, is that test_logging is using time.timezone for dates in the past for which the time zone may have been different from the current one on the current location. The attached patch shows that time calculations involving time.timezone may not be valid for dates different from the current one, as not even daylight-savings/summer times are taken into account, so the test may also fail depending on the time of the year it is run on. -- Added file: http://bugs.python.org/file26246/test_timezones.patch ___ Python tracker <http://bugs.python.org/issue14902> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14902] test_logging failed
Juancarlo Añez added the comment: @Vinay No reason. datetime.astimezone(None) is documented in 3.3. You may even use: r.created = time.mktime(dt.astimezone().timetuple()) -- ___ Python tracker <http://bugs.python.org/issue14902> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14902] test_logging failed
Changes by Juancarlo Añez : -- type: compile error -> behavior ___ Python tracker <http://bugs.python.org/issue14902> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15247] io.open() is inconsistent re os.open()
New submission from Juancarlo Añez : >>> import io >>> d = io.open('.') Traceback (most recent call last): File "", line 1, in IsADirectoryError: [Errno 21] Is a directory: '.' >>> >>> import os >>> d = io.open(os.open('.',0)) >>> d <_io.TextIOWrapper name=3 mode='r' encoding='UTF-8'> >>> -- components: Library (Lib) messages: 164633 nosy: apalala priority: normal severity: normal status: open title: io.open() is inconsistent re os.open() type: behavior versions: Python 2.7, Python 3.3 ___ Python tracker <http://bugs.python.org/issue15247> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15247] io.open() is inconsistent re os.open()
Juancarlo Añez added the comment: io.open() clearly doesn't care about opening directories as long as they are passed as os.open() file descriptors. Quite unexpected! -- ___ Python tracker <http://bugs.python.org/issue15247> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15247] io.open() is inconsistent re os.open()
Juancarlo Añez added the comment: Note that attempting subsequent operations on the returned object do raise IsADirectoryError. >>> import io >>> import os >>> d = io.open(os.open('.',0)) >>> d.read() Traceback (most recent call last): File "", line 1, in IsADirectoryError: [Errno 21] Is a directory >>> -- ___ Python tracker <http://bugs.python.org/issue15247> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15558] webbrowser output to console
New submission from Juancarlo Añez: Under Ubuntu Linux 11.10 and 12.04, webbroser.open() will output the following message to the console: Created new window in existing browser session. The behavior is both unexpected and troublesome. -- components: Library (Lib) messages: 167443 nosy: apalala priority: normal severity: normal status: open title: webbrowser output to console type: behavior versions: Python 2.7, Python 3.2 ___ Python tracker <http://bugs.python.org/issue15558> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15618] turtle.pencolor() chokes on unicode
New submission from Juancarlo Añez: >>> t.pencolor(u'red') Traceback (most recent call last): File "", line 1, in File "", line 1, in pencolor File "/usr/lib/python2.7/lib-tk/turtle.py", line 2166, in pencolor color = self._colorstr(args) File "/usr/lib/python2.7/lib-tk/turtle.py", line 2600, in _colorstr return self.screen._colorstr(args) File "/usr/lib/python2.7/lib-tk/turtle.py", line , in _colorstr r, g, b = [round(255.0*x) for x in (r, g, b)] TypeError: can't multiply sequence by non-int of type 'float' -- components: Library (Lib) messages: 167883 nosy: apalala priority: normal severity: normal status: open title: turtle.pencolor() chokes on unicode type: behavior versions: Python 2.7 ___ Python tracker <http://bugs.python.org/issue15618> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15618] turtle.pencolor() chokes on unicode
Juancarlo Añez added the comment: This patch solves the problem by making turtle check for string against basestring insted of str. -- keywords: +patch Added file: http://bugs.python.org/file26758/turtle_unicode.patch ___ Python tracker <http://bugs.python.org/issue15618> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15618] turtle.pencolor() chokes on unicode
Juancarlo Añez added the comment: The bug showed up in a script that used: from __future__ import unicode_literals -- ___ Python tracker <http://bugs.python.org/issue15618> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15620] readline.clear_history() missing in test_readline.py
New submission from Juancarlo Añez: $ lsb_release -a LSB Version: core-2.0-amd64:core-2.0-noarch:core-3.0-amd64:core-3.0-noarch:core-3.1-amd64:core-3.1-noarch:core-3.2-amd64:core-3.2-noarch:core-4.0-amd64:core-4.0-noarch Distributor ID: Ubuntu Description:Ubuntu 12.04.1 LTS Release:12.04 Codename: precise $ hg branch 2.7 $ ./python Lib/test/test_readline.py testHistoryUpdates (__main__.TestHistoryManipulation) ... ERROR == ERROR: testHistoryUpdates (__main__.TestHistoryManipulation) -- Traceback (most recent call last): File "Lib/test/test_readline.py", line 16, in testHistoryUpdates readline.clear_history() AttributeError: 'module' object has no attribute 'clear_history' -- Ran 1 test in 0.003s FAILED (errors=1) Traceback (most recent call last): File "Lib/test/test_readline.py", line 43, in test_main() File "Lib/test/test_readline.py", line 40, in test_main run_unittest(TestHistoryManipulation) File "/art/python/cpython/Lib/test/test_support.py", line 1125, in run_unittest _run_suite(suite) File "/art/python/cpython/Lib/test/test_support.py", line 1108, in _run_suite raise TestFailed(err) test.test_support.TestFailed: Traceback (most recent call last): File "Lib/test/test_readline.py", line 16, in testHistoryUpdates readline.clear_history() AttributeError: 'module' object has no attribute 'clear_history' -- components: Tests messages: 167919 nosy: apalala priority: normal severity: normal status: open title: readline.clear_history() missing in test_readline.py type: behavior versions: Python 2.7 ___ Python tracker <http://bugs.python.org/issue15620> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15620] readline.clear_history() missing in test_readline.py
Juancarlo Añez added the comment: $ dpkg -l | grep readline ii libreadline-dev 6.2-8 GNU readline and history libraries, development files ii libreadline55.2-11 GNU readline and history libraries, run-time libraries ii libreadline66.2-8 GNU readline and history libraries, run-time libraries ii libreadline6-dev6.2-8 GNU readline and history libraries, development files ii readline-common 6.2-8 GNU readline and history libraries, common files -- ___ Python tracker <http://bugs.python.org/issue15620> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15620] readline.clear_history() missing in test_readline.py
Juancarlo Añez added the comment: Check if clear_history() is available before calling it. -- keywords: +patch Added file: http://bugs.python.org/file26761/readline_clear_history_available.patch ___ Python tracker <http://bugs.python.org/issue15620> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15620] readline.clear_history() missing in test_readline.py
Changes by Juancarlo Añez : Removed file: http://bugs.python.org/file26761/readline_clear_history_available.patch ___ Python tracker <http://bugs.python.org/issue15620> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15620] readline.clear_history() missing in test_readline.py
Juancarlo Añez added the comment: Check if clear_history() is available before calling it. -- Added file: http://bugs.python.org/file26762/readline_clear_history_available.patch ___ Python tracker <http://bugs.python.org/issue15620> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com