[issue30670] pprint for dict in sorted order or insert order?
Philippe added the comment: IMHO since Guido said that dictionary order is guaranteed going forward [1], the order should now be preserved in 3.6+ Getting a sorted pprint'ed dict in a Py3.6.1 was a surprise to me coming from Python2. The default ordered dict is to me the killer feature of 3 and a major driver to migrate from 2. [1] https://mail.python.org/pipermail/python-dev/2017-December/151286.html -- nosy: +pombreda ___ Python tracker <https://bugs.python.org/issue30670> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24514] tarfile fails to extract archive (handled fine by gnu tar and bsdtar)
New submission from Philippe: The extraction fails when calling tarfile.open using this archive: http://archive.apache.org/dist/commons/logging/source/commons-logging-1.1.2-src.tar.gz After some investigation, the file can be extracted with gnu tar and bsdtar and the gzip compression is not the issue: if I gunzip the tar.gz to a tar and call tarfile on plain tar, the problem is the same. Also this archive was created most likely on Windows (based on the `file` command output) using some Java tools per http://commons.apache.org/proper/commons-logging/building.html from these original files: http://svn.apache.org/repos/asf/commons/proper/logging/tags/LOGGING_1_1_2/ ... that's all I could find out. The error trace is slightly different on 2.7 and 3.4 but similar. The problem has been verified on Linux 64 with Python 2.7 and 3.4 and on Windows with Python 2.7. On 2.7: >>> TarFile.taropen(name) Traceback (most recent call last): File "", line 1, in File "/usr/lib/python2.7/tarfile.py", line 1705, in taropen return cls(name, mode, fileobj, **kwargs) File "/usr/lib/python2.7/tarfile.py", line 1574, in __init__ self.firstmember = self.next() File "/usr/lib/python2.7/tarfile.py", line 2335, in next raise ReadError(str(e)) tarfile.ReadError: invalid header On 3.4: >>> TarFile.taropen(name) Traceback (most recent call last): File "/usr/lib/python3.4/tarfile.py", line 180, in nti n = int(nts(s, "ascii", "strict") or "0", 8) ValueError: invalid literal for int() with base 8: ' ' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.4/tarfile.py", line 2248, in next tarinfo = self.tarinfo.fromtarfile(self) File "/usr/lib/python3.4/tarfile.py", line 1083, in fromtarfile obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors) File "/usr/lib/python3.4/tarfile.py", line 1032, in frombuf obj.uid = nti(buf[108:116]) File "/usr/lib/python3.4/tarfile.py", line 182, in nti raise InvalidHeaderError("invalid header") tarfile.InvalidHeaderError: invalid header During handling of the above exception, another exception occurred: Traceback (most recent call last): File "", line 1, in File "/usr/lib/python3.4/tarfile.py", line 1595, in taropen return cls(name, mode, fileobj, **kwargs) File "/usr/lib/python3.4/tarfile.py", line 1469, in __init__ self.firstmember = self.next() File "/usr/lib/python3.4/tarfile.py", line 2260, in next raise ReadError(str(e)) tarfile.ReadError: invalid header -- components: Library (Lib) files: commons-logging-1.1.2-src.tar.gz messages: 245839 nosy: lars.gustaebel, pombreda priority: normal severity: normal status: open title: tarfile fails to extract archive (handled fine by gnu tar and bsdtar) versions: Python 2.7, Python 3.4 Added file: http://bugs.python.org/file39814/commons-logging-1.1.2-src.tar.gz ___ Python tracker <http://bugs.python.org/issue24514> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24514] tarfile fails to extract archive (handled fine by gnu tar and bsdtar)
Philippe added the comment: Note: the traceback above are from calling taropen on the gunzipped tar.gz The error are similar but a tar less informative when using the tgz and open. -- ___ Python tracker <http://bugs.python.org/issue24514> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24514] tarfile fails to extract archive (handled fine by gnu tar and bsdtar)
Philippe added the comment: lars: you are my hero! you rock. I picture you being able to read through tar binary headers while you sleep. I am in awe. -- ___ Python tracker <http://bugs.python.org/issue24514> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24514] tarfile fails to extract archive (handled fine by gnu tar and bsdtar)
Philippe added the comment: I verified that the patch issue24514.diff (adding .rstrip() ) works also on Python 2.7. I verified it also works on Python 3.4 I ran it on 2.7 against a fairly large test suite of tar files without problems. This is a +1 for me. Lars: Do you think you could apply it to 2.7 too? -- ___ Python tracker <http://bugs.python.org/issue24514> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23929] Minor typo (way vs. away) in os.renames docstring
New submission from Philippe: There is a minor typo in the docstring of os.renames: See https://hg.python.org/cpython/file/37905786b34b/Lib/os.py#l275 "After the rename, directories corresponding to rightmost path segments of the old name will be pruned way until [..]" This should be using away instead of way: "After the rename, directories corresponding to rightmost path segments of the old name will be pruned away until [..]" -- assignee: docs@python components: Documentation messages: 240595 nosy: docs@python, pombreda priority: normal severity: normal status: open title: Minor typo (way vs. away) in os.renames docstring type: enhancement versions: Python 2.7, Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 3.6 ___ Python tracker <http://bugs.python.org/issue23929> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8033] sqlite: broken long integer handling for arguments to user-defined functions
Philippe Devalkeneer added the comment: Hello, Here is a patch to fix it :) ( and don't blame me too much if something is not correct, it's the first patch I submit :) ) Philippe -- keywords: +patch nosy: +flupke Added file: http://bugs.python.org/file19511/broken_long_int_sqlite_functions.diff ___ Python tracker <http://bugs.python.org/issue8033> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8033] sqlite: broken long integer handling for arguments to user-defined functions
Philippe Devalkeneer added the comment: The regression tests do not work anymore in test_sqlite with the patch, I will check more in details why... sorry. -- ___ Python tracker <http://bugs.python.org/issue8033> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8033] sqlite: broken long integer handling for arguments to user-defined functions
Philippe Devalkeneer added the comment: No, I made a mistake in my build, test_sqlite runs actually fine before and after the patch. -- ___ Python tracker <http://bugs.python.org/issue8033> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8033] sqlite: broken long integer handling for arguments to user-defined functions
Philippe Devalkeneer added the comment: Ok I will redo a patch in the next few days with possibly type sqlite3_int64, and include tests. Thank you for the review. -- ___ Python tracker <http://bugs.python.org/issue8033> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8033] sqlite: broken long integer handling for arguments to user-defined functions
Philippe Devalkeneer added the comment: Here a new patch with sqlite3_int64 type, and unit test. -- Added file: http://bugs.python.org/file20313/broken_long_sqlite_userfunctions.diff ___ Python tracker <http://bugs.python.org/issue8033> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28937] str.split(): allow removing empty strings (when sep is not None)
Philippe Cloutier added the comment: I understood the current (only) behavior, but coming from a PHP background, I really didn't expect it. Thank you for this request, I would definitely like the ability to get behavior matching PHP's explode(). -- nosy: +Philippe Cloutier title: str.split(): remove empty strings when sep is not None -> str.split(): allow removing empty strings (when sep is not None) ___ Python tracker <https://bugs.python.org/issue28937> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28937] str.split(): allow removing empty strings (when sep is not None)
Philippe Cloutier added the comment: I assume the "workaround" suggested by Raymond in msg282966 is supposed to read... filter(None, str.split(sep) ... rather than filter(None, sep.split(input)). -- ___ Python tracker <https://bugs.python.org/issue28937> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30349] Preparation for advanced set syntax in regular expressions
Philippe Ombredanne added the comment: FWIW, this warning is annoying because it is hard to fix in the case where the regex are source from data: the warning message does not include the regex at fault; it should otherwise the warning is noisy and ineffective IMHO. -- nosy: +pombredanne ___ Python tracker <https://bugs.python.org/issue30349> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30349] Preparation for advanced set syntax in regular expressions
Philippe Ombredanne added the comment: Sorry, my comment was at best nonsensical gibberish! I meant to say that this warning message should include the actual regex at fault; otherwise it is hard to fix when the regex in question comes from some data structure like a list; then the line number where the warning occurs is not enough to fix the issue; the code needs to be instrumented first to catch warning which is rather heavy handed to handle a warning. -- ___ Python tracker <https://bugs.python.org/issue30349> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34723] lower() on Turkish letter "İ" returns a 2-chars-long string
Philippe Ombredanne added the comment: There is a weird thing though (using Python 3.6.8): >>> [x.lower() for x in 'İ'] ['i̇'] >>> [x for x in 'İ'.lower()] ['i', '̇'] I would expect that the results would be the same in both cases. (And this is a source of a bug for some code of mine) -- nosy: +pombredanne ___ Python tracker <https://bugs.python.org/issue34723> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34723] lower() on Turkish letter "İ" returns a 2-chars-long string
Philippe Ombredanne added the comment: Thank for the (re) explanation. Unicode is tough! Basically this is the issue i have really in the end with the folding: what used to be a proper alpha string is not longer one after a lower() because the second codepoint is a punctuation and I use a regex split on the \W word class that then behaves differently when the string is lowercased as we have an extra punctuation then to break on. I will find a way around these (rare) cases alright! Sorry for the noise. ``` >>> 'İ'.isalpha() True >>> 'İ'.lower().isalpha() False ``` -- ___ Python tracker <https://bugs.python.org/issue34723> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4963] mimetypes.guess_extension result changes after mimetypes.init()
Philippe Ombredanne added the comment: The changes introduced by this ticket in https://github.com/python/cpython/commit/9fc720e5e4f772598013ea48a3f0d22b2b6b04fa#r45794801 are problematic. I discovered this from having tests failing when testing on Python 3.7 and up The bug is that calling mimetypes.init(files) will NOT use my files, but instead use both my files and knownfiles. This was not the case before as knownfiles would be ignored as expected when I provide my of files list. This is a breaking API change IMHO and introduces a buggy unstability : even if I want to ignore knownfiles by providing my list of of files, knownfiles will always be added and this results in erratic and buggy behaviour as the content of "knownfiles" is completely random based on the OS version and else. The code I am using is here https://github.com/nexB/typecode/blob/ba07c04d23441d3469dc5de911376d408514ebd8/src/typecode/contenttype.py#L308 I think we should reopen to fix (or create a new ticket) -- nosy: +pombredanne ___ Python tracker <https://bugs.python.org/issue4963> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4963] mimetypes.guess_extension result changes after mimetypes.init()
Philippe Ombredanne added the comment: Actually this is problematic on multiples counts: 1. the behaviour changes and this is a regression 2. even if that new buggy behaviour was the one to use, it should not give preference to knownfiles ovr init-provided files, but at least take the provided files first and knownfiles second. -- ___ Python tracker <https://bugs.python.org/issue4963> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34723] lower() on Turkish letter "İ" returns a 2-chars-long string
Philippe Ombredanne added the comment: Şahin Kureta you wrote: > I know it is not finalized and released yet but are you going to > implement Version 14.0.0 of the Unicode Standard? > It finally solves the issue of Turkish lower/upper case 'I' and 'i'. Thank you for the pointer! I guess this spec could likely be under consideration for Python when it becomes final (but unlikely before?). -- ___ Python tracker <https://bugs.python.org/issue34723> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41836] Improve ctypes error reporting with missing DLL path
New submission from Philippe Ombredanne : When the dependency of a DLL is missing (at least on Windows) the error " OSError: [WinError 126] The specified module could not be found" is raised when calling ctypes.CDLL(dll_path) even when this "dll_path" exists... because the error comes from another DLL. These errors are really hard to diagnose because the path of the missing DLL is not returned in the exception message. Returning it would help fixing these kind of errors quickly. Researching errors such as this one https://github.com/nexB/scancode-toolkit/issues/2236 wastes quite a bit of time and would be made a non issue if we had the path in the error message. -- components: ctypes messages: 377322 nosy: pombredanne priority: normal severity: normal status: open title: Improve ctypes error reporting with missing DLL path type: enhancement versions: Python 3.6 ___ Python tracker <https://bugs.python.org/issue41836> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41836] Improve ctypes error reporting with missing DLL path
Philippe Ombredanne added the comment: Eric, Thanks! This is IMHO a dupe of https://bugs.python.org/issue25655 in earnest. So I am closing this in favor of that and will carry over comments there -- components: -Windows resolution: -> duplicate stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue41836> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25655] Python errors related to failures loading DLL's lack information
Philippe Ombredanne added the comment: >From https://bugs.python.org/issue41836 closed as a dupe of this: When the dependency of a DLL is missing (at least on Windows) the error " OSError: [WinError 126] The specified module could not be found" is raised when calling ctypes.CDLL(dll_path) even when this "dll_path" exists... because the error comes from another DLL. These errors are really hard to diagnose because the path of the missing DLL is not returned in the exception message. Returning it would help fixing these kind of errors quickly. Researching errors such as this one https://github.com/nexB/scancode-toolkit/issues/2236 wastes quite a bit of time and would be made a non issue if we had the path in the error message. and this reply from Eric Smith: https://bugs.python.org/msg377324 > Author: Eric V. Smith (eric.smith) * (Python committer) Date: > 2020-09-22 14:13 > My understanding is that Windows doesn't tell you which DLL is missing. I > think the best we could do is append something to the error message saying > "or one its dependencies". -- nosy: +pombredanne ___ Python tracker <https://bugs.python.org/issue25655> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25655] Python errors related to failures loading DLL's lack information
Philippe Ombredanne added the comment: Eric Smith, you wrote: > My understanding is that Windows doesn't tell you which DLL is missing. I > think the best we could do is append something to the error message saying > "or one its dependencies". If we have such an error message, this means the main DLL exists: the original path passed to ctypes exists and is a valid DLL otherwise the message would be different. So I think that this is always a direct or indirect dependency of that primary DLL that would be missing and we could be explicit in the error message. We could also provide some hints in the error message on how to research the issue may be? -- nosy: -altendky ___ Python tracker <https://bugs.python.org/issue25655> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25655] Python errors related to failures loading DLL's lack information
Philippe Ombredanne added the comment: > I wouldn't refuse a docs PR to add a short section pointing to > this page and explaining its relevance: > https://docs.microsoft.com/cpp/build/reference/dependents Steve, would you see this as a note in https://docs.python.org/3/library/ctypes.html?highlight=ctypes#loading-shared-libraries What about something like this? class ctypes.CDLL . Note: On Windows a call to CDLL(name) may fail even if the DLL name exists when a dependent DLL of this DLL is found. This will lead to an OSErrror error with the message "[WinError 126] The specified module could not be found". This error message does not contains the name of the missing DLL because the Windows API does not return this information making this error hard to diagnose. To resolve this error and determine which DLL is missing, you need to find the list of dependent DLLs using Windows debugging and tracing tools. See https://docs.microsoft.com/cpp/build/reference/dependents for some explanations. -- ___ Python tracker <https://bugs.python.org/issue25655> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25655] Python errors related to failures loading DLL's lack information
Change by Philippe Ombredanne : -- keywords: +patch pull_requests: +21411 stage: -> patch review pull_request: https://github.com/python/cpython/pull/22372 ___ Python tracker <https://bugs.python.org/issue25655> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25655] Python errors related to failures loading DLL's lack information
Philippe Ombredanne added the comment: So the other locations to add docs would be petinetially - https://docs.python.org/3/library/os.html?#os.add_dll_directory - https://docs.python.org/3/extending/windows.html - https://docs.python.org/3/faq/windows.html#is-a-pyd-file-the-same-as-a-dll - https://docs.python.org/3/using/windows.html#finding-modules Which ones would be the best? Also AFAIK there is no Windows Sphinx tag beyond .. availability:: -- ___ Python tracker <https://bugs.python.org/issue25655> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6717] Some problem with recursion handling
Changes by Philippe Devalkeneer : -- nosy: +flupke ___ Python tracker <http://bugs.python.org/issue6717> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37783] int returns error (SystemError)
New submission from Philippe Négrel : Whenever I compile the code, I get this error: Exception has occurred: SystemError returned a result with an error set This issue occured at line 32 of the file "SaveTools.py" in this github branch: https://github.com/l3alr0g/Wave-simulator/tree/python-bug-report (sry I couldn't get the 'Github PR' option to work) -- components: Windows messages: 349158 nosy: Philippe Négrel, paul.moore, steve.dower, tim.golden, zach.ware priority: normal severity: normal status: open title: int returns error (SystemError) type: compile error versions: Python 3.7 ___ Python tracker <https://bugs.python.org/issue37783> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37783] int returns error (SystemError)
Philippe Négrel added the comment: My bad the compilation Error came from the panda3d engine, which affected in some way the int class, issue solved, sry for wasting your time X) -- resolution: third party -> not a bug stage: test needed -> resolved status: pending -> closed ___ Python tracker <https://bugs.python.org/issue37783> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31535] configparser unable to write comment with a upper cas letter
New submission from Philippe Wagnieres: I create entry with this: self.settings.set('General', 'Initial filter', 'All file (*.*)') self.settings.set('General', '# 1 => Text files (*.txt)') self.settings.set('General', '# 2 => CSV files (*.csv)') self.settings.set('General', '# 3 => Text files (*.txt) \n') and after writing in a file: initial filter = All file (*.*) ; 1 => text files (*.txt) ; 2 => csv files (*.csv) # 3 => text files (*.txt) (; or # to test if differ) It is normal or not? Thank & Best Regards -- messages: 302633 nosy: philippewagnie...@hispeed.ch priority: normal severity: normal status: open title: configparser unable to write comment with a upper cas letter type: enhancement versions: Python 3.6 ___ Python tracker <https://bugs.python.org/issue31535> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24960] Can't use lib2to3 with embeddable zip file.
Philippe Pinard added the comment: As Sébastien Taylor, I ran into the same problem. The workaround I found was to unzip the content of python35.zip and put it in the Lib/ folder. -- nosy: +ppinard ___ Python tracker <http://bugs.python.org/issue24960> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21109] tarfile: Traversal attack vulnerability
Philippe Godbout added the comment: Lars, I think the suggested approach is great. Documentation for the tarfile class should be changed in order to direct user to the "safe" version with an relevant warning. A bit like what is done for PRNG safety. As stated by Eduardo an optional "safe" parameter to opt into safe mode could also be an interesting approach. -- nosy: +Philippe Godbout ___ Python tracker <https://bugs.python.org/issue21109> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31535] configparser unable to write comment with a upper cas letter
Philippe Wagnieres added the comment: Thank for your support. Sorry I have no time to give you an answer and work on with Python, but I have understand the solution. Best Regards Philippe Wagnières Chalamont 6 1400 Yverdon-les-Bains Suisse tel.: +41 76 367 27 43 Le 24.09.2018 à 17:42, Karthikeyan Singaravelan a écrit : > Karthikeyan Singaravelan added the comment: > > All config options including comment are converted to lowercase when they are > stored. You can customize this behavior using > https://docs.python.org/3/library/configparser.html#configparser.ConfigParser.optionxform > . You can also refer to > https://docs.python.org/3/library/configparser.html#customizing-parser-behaviour > for more customization. I am closing this as not a bug as part of triaging. > Feel free to reopen this if needed. > > Thanks for the report Philippe! > > -- > resolution: -> not a bug > stage: -> resolved > status: open -> closed > > ___ > Python tracker > <https://bugs.python.org/issue31535> > ___ -- ___ Python tracker <https://bugs.python.org/issue31535> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22327] test_gdb failures on Ubuntu 14.10
Changes by Philippe Devalkeneer : -- nosy: +flupke ___ Python tracker <http://bugs.python.org/issue22327> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14076] sqlite3 module ignores placeholders in CREATE TRIGGER code
Changes by Philippe Devalkeneer : -- nosy: +flupke ___ Python tracker <http://bugs.python.org/issue14076> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1525806] Tkdnd mouse cursor handling patch
Changes by Philippe Devalkeneer : -- nosy: +flupke ___ Python tracker <http://bugs.python.org/issue1525806> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22399] Doc: missing anchor for dict in library/functions.html
New submission from Philippe Dessauw: There is a missing anchor for the dict functions in the documentation at library/functions.html. It is present in the documentation of all python version. It seems to impact cross-referencing in Sphinx (using intersphinx). -- assignee: docs@python components: Documentation messages: 226845 nosy: docs@python, pdessauw priority: normal severity: normal status: open title: Doc: missing anchor for dict in library/functions.html type: enhancement versions: Python 2.7 ___ Python tracker <http://bugs.python.org/issue22399> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19887] Path.resolve() fails on complex symlinks
Philippe Fremy added the comment: Hi, This precise set of tests fails on Windows 7 on a NTFS partition (on revision c0b0e7aef360+ tip ), see below. The problem is probably minor (drive letter case). I won't be able to develop a fix myself, but I'll be happy to test one. cheers, Philippe == FAIL: test_complex_symlinks_absolute (test.test_pathlib.PathTest) -- Traceback (most recent call last): File "c:\Users\Morgane\Documents\000\Dev\CPython\cpython\lib\test\test_pathlib.py", line 1724, in test_complex_symlinks_absolute self._check_complex_symlinks(BASE) File "c:\Users\Morgane\Documents\000\Dev\CPython\cpython\lib\test\test_pathlib.py", line 1692, in _check_complex_symlinks self.assertEqual(str(p), BASE) AssertionError: 'C:\\Users\\Morgane\\Documents\\000\\Dev\\[53 chars]_tmp' != 'c:\\Users\\Morgane\\Documents\\000\\Dev\\[53 chars]_tmp' - C:\Users\Morgane\Documents\000\Dev\CPython\cpython\build\test_python_6060\@test_6060_tmp ? ^ + c:\Users\Morgane\Documents\000\Dev\CPython\cpython\build\test_python_6060\@test_6060_tmp ? ^ == FAIL: test_complex_symlinks_relative (test.test_pathlib.PathTest) -- Traceback (most recent call last): File "c:\Users\Morgane\Documents\000\Dev\CPython\cpython\lib\test\test_pathlib.py", line 1728, in test_complex_symlinks_relative self._check_complex_symlinks('.') File "c:\Users\Morgane\Documents\000\Dev\CPython\cpython\lib\test\test_pathlib.py", line 1692, in _check_complex_symlinks self.assertEqual(str(p), BASE) AssertionError: 'C:\\Users\\Morgane\\Documents\\000\\Dev\\[53 chars]_tmp' != 'c:\\Users\\Morgane\\Documents\\000\\Dev\\[53 chars]_tmp' - C:\Users\Morgane\Documents\000\Dev\CPython\cpython\build\test_python_6060\@test_6060_tmp ? ^ + c:\Users\Morgane\Documents\000\Dev\CPython\cpython\build\test_python_6060\@test_6060_tmp ? ^ == FAIL: test_complex_symlinks_relative_dot_dot (test.test_pathlib.PathTest) -- Traceback (most recent call last): File "c:\Users\Morgane\Documents\000\Dev\CPython\cpython\lib\test\test_pathlib.py", line 1732, in test_complex_symlinks_relative_dot_dot self._check_complex_symlinks(os.path.join('dirA', '..')) File "c:\Users\Morgane\Documents\000\Dev\CPython\cpython\lib\test\test_pathlib.py", line 1692, in _check_complex_symlinks self.assertEqual(str(p), BASE) AssertionError: 'C:\\Users\\Morgane\\Documents\\000\\Dev\\[53 chars]_tmp' != 'c:\\Users\\Morgane\\Documents\\000\\Dev\\[53 chars]_tmp' - C:\Users\Morgane\Documents\000\Dev\CPython\cpython\build\test_python_6060\@test_6060_tmp ? ^ + c:\Users\Morgane\Documents\000\Dev\CPython\cpython\build\test_python_6060\@test_6060_tmp ? ^ == FAIL: test_complex_symlinks_absolute (test.test_pathlib.WindowsPathTest) -- Traceback (most recent call last): File "c:\Users\Morgane\Documents\000\Dev\CPython\cpython\lib\test\test_pathlib.py", line 1724, in test_complex_symlinks_absolute self._check_complex_symlinks(BASE) File "c:\Users\Morgane\Documents\000\Dev\CPython\cpython\lib\test\test_pathlib.py", line 1692, in _check_complex_symlinks self.assertEqual(str(p), BASE) AssertionError: 'C:\\Users\\Morgane\\Documents\\000\\Dev\\[53 chars]_tmp' != 'c:\\Users\\Morgane\\Documents\\000\\Dev\\[53 chars]_tmp' - C:\Users\Morgane\Documents\000\Dev\CPython\cpython\build\test_python_6060\@test_6060_tmp ? ^ + c:\Users\Morgane\Documents\000\Dev\CPython\cpython\build\test_python_6060\@test_6060_tmp ? ^ == FAIL: test_complex_symlinks_relative (test.test_pathlib.WindowsPathTest) -- Traceback (most recent call last): File "c:\Users\Morgane\Documents\000\Dev\CPython\cpython\lib\test\test_pathlib.py", line 1728, in test_complex_symlinks_relative self._check_complex_symlinks('.') File "c:\Users\Morgane\Documents\000\Dev\CPython\cpython\lib\test\test_pathlib.py", line 1692, in _check_complex_symlinks self.assertEqual(str(p), BASE) AssertionError: 'C:\\Users\\Morgane\\Documents\\000\\Dev\\[53 chars]_tmp' != 'c:\\Users\\Morgane\\Documents\\000\\Dev\\[53 chars]_tmp' - C:\Users\Morgane\Documents\000\Dev\CPython\cpython\build
[issue7132] Regexp: capturing groups in repetitions
New submission from Philippe Verdy : For now, when capturing groups are used within repetitions, it is impossible to capure what they match individually within the list of matched repetitions. E.g. the following regular expression: (0|1[0-9]{0,2}|2(?:[0-4][0-9]?|5[0-5]?)?)(?:\.(0|1[0-9]{0,2}|2(?:[0-4][0-9]?|5[0-5]?)?)){3} is a regexp that contains two capturing groups (\1 and \2), but whose the second one is repeated (3 times) to match an IPv4 address in dotted decimal format. We'd like to be able to get the individual multiple matchs for the second group. For now, capturing groups don't record the full list of matches, but just override the last occurence of the capturing group (or just the first if the repetition is not greedy, which is not the case here because the repetition "{3}" is not followed by a "?"). So \1 will effectively return the first decimal component of the IPv4 address, but \2 will just return the last (fourth) decimal component. I'd like to have the possibility to have a compilation flag "R" that would indicate that capturing groups will not just return a single occurence, but all occurences of the same group. If this "R" flag is enabled, then: - the Match.group(index) will not just return a single string but a list of strings, with as many occurences as the number of effective repetitions of the same capturing group. The last element in that list will be the one equal to the current behavior - the Match.start(index) and Match.end(index) will also both return a list of positions, those lists having the same length as the list returned by Match.group(index). - for consistency, the returned values as lists of strings (instead of just single strings) will apply to all capturing groups, even if they are not repeated. Effectively, with the same regexp above, we will be able to retreive (and possibily substitute): - the first decimal component of the IPv4 address with "{\1:1}" (or "{\1:}" or "{\1}" or "\1" as before), i.e. the 1st (and last) occurence of capturing group 1, or in Match.group(1)[1], or between string positions Match.start(1)[1] and Match.end(1)[1] ; - the second decimal component of the IPv4 address with "{\2:1}", i.e. the 1st occurence of capturing group 2, or in Match.group(2)[1], or between string positions Match.start(2)[1] and Match.end(2)[1] ; - the third decimal component of the IPv4 address with "{\2:2}", i.e. the 2nd occurence of capturing group 2, or in Match.group(2)[2], or between string positions Match.start(2)[2] and Match.end(2)[2] ; - the fourth decimal component of the IPv4 address with "{\2:3}" (or "{\2:}" or "{\2}" or "\2"), i.e. the 3rd (and last) occurence of capturing group 2, or in Match.group(2)[2], or between string positions Match.start(2)[3] and Match.end(2)[3] ; This should work with all repetition patterns (both greedy and not greedy, atomic or not, or possessive), in which the repeated pattern contains any capturing group. This idea should also be submitted to the developers of the PCRE library (and Perl from which they originate, and PHP where PCRE is also used), so that they adopt a similar behavior in their regular expressions. If there's already a candidate syntax or compilation flag in those libraries, this syntax should be used for repeated capturing groups. -- components: Library (Lib) messages: 94022 nosy: verdy_p severity: normal status: open title: Regexp: capturing groups in repetitions type: feature request versions: Python 3.2 ___ Python tracker <http://bugs.python.org/issue7132> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7132] Regexp: capturing groups in repetitions
Philippe Verdy added the comment: I'd like to add that the same behavior should also affect the span(index) method of MatchObject, that should also not just return a single (start, end) pair, but that should in this case return a list of pairs, one for each occurence, when the "R" compilation flag is specified. This also means that the regular expression compilation flag R should be supported as these constants: Regexp.R, or Regexp.REPETITIONS -- ___ Python tracker <http://bugs.python.org/issue7132> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7132] Regexp: capturing groups in repetitions
Philippe Verdy added the comment: Rationale for the compilation flag: You could think that the compilation flag should not be needed. However, not using it would mean that a LOT of existing regular expressions that already contain capturing groups in repetitions, and for which the caputiring group is effectively not used and should have been better encoded as a non-capuring group like (?:X) instead of (X), will suffer a negative performance impact and a higher memory usage. The reason is that the MatchObject will have to store lists of (start,end) pairs instead of just a single pair: using a list will not be the default, so MatchObject.group(groupIndex), MatchObject.start(groupIndex), MatchObject.end(groupIndex), and MatchObject.span(groupIndex) will continue to return a single value or single pair, when the R compilation flag is not set (these values will continue to return only the last occurence, that will be overriden after each matched occurence of the capturing group. The MatchObject.groups() will also continue to return a list of single strings without that flag set (i.e. a list of the last occurence of each capturing group). But when the R flag will be specified, it will return, instead, a list of lists, each element being for the group with the same index and being itself a list of strings, one for each occurence of the capturing group. -- ___ Python tracker <http://bugs.python.org/issue7132> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7132] Regexp: capturing groups in repetitions
Philippe Verdy added the comment: Implementation details: Currently, the capturing groups behave quite randomly in the values returned by MachedObject, when backtracking occurs in a repetition. This proposal will help fix the behavior, because it will also be much easier to backtrack cleanly, occurence by occurence, by just dropping the last element in the list of (start,end) pairs stored in the MatchedObject for all capturing groups specified WITHIN the repeated sub-expression. -- ___ Python tracker <http://bugs.python.org/issue7132> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7132] Regexp: capturing groups in repetitions
Philippe Verdy added the comment: Note that I used the IPv4 address format only as an example. There are plenty of other more complex cases for which we really need to capture the multiple occurences of a capturing group within a repetition. I'm NOT asking you how to parse it using MULTIPLE regexps and functions. Of course you can, but this is a distinct problem, but certinaly NOT a general solution (your solution using split() will NOT work with really A LOT of other regular expressions). -- ___ Python tracker <http://bugs.python.org/issue7132> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7132] Regexp: capturing groups in repetitions
Philippe Verdy added the comment: In addition, your suggested regexp for IPv4: '^(\d{1,3})(?:\.(\d{1,3})){3}$' is completely WRONG ! It will match INVALID IPv4 address formats like "000.000.000.000". Reread the RFCs... because "000.000.000.000" is CERTAINLY NOT an IPv4 address (if it is found in an URL) but a domain name that must be resolved into an IP address using domain name resolution requests. -- ___ Python tracker <http://bugs.python.org/issue7132> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7132] Regexp: capturing groups in repetitions
Philippe Verdy added the comment: You're wrong, it WILL be compatible, because it is only conditioned by a FLAG. The flag is there specifically for instructing the parser to generate lists of values rather than single values. Without the regular compilation flag set, as I said, there will be NO change. Reopening the proposal, which is perfectly valid ! -- status: pending -> open ___ Python tracker <http://bugs.python.org/issue7132> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7132] Regexp: capturing groups in repetitions
Philippe Verdy added the comment: Summary of your points with my responses : > 1) it doesn't exist in any other implementation that I know; That's exactly why I proposed to discuss it with the developers of other implementations (I cited PCRE, Perl and PHP developers, there are others). > 2) if implemented as default behavior: > * it won't be backward-compatible; Wrong. This does not even change the syntax of regualr expressions themselves. > * it will increase the complexity; Wrong. All the mechanic is already implemented: when the parser will store string index positions for a matched group it will just have to append a pair in the list stored in MatchObject.group(index) (it will create the list if it is not ealready there, but it should be already initialized to an empty list by the compiler) when the flag.R is set, instead of overwriting the existing pair without wondering if there was already another occurence of the same capturing group. > 3) it will be a proprietary extension and it will reduce the compatibility with other implementations; Already suggested above. This will hovever NOT affect the compatibility of existing implementation that don't have the R flag. > 4) I can't think to any real word situation where this would be really useful. There are really a lot ! Using multiple split operations and multiple parsing on partly parsed regular expressions will not be a solution in many situations (think about how you would perform matches and using them that in 'vi' or 'ed' with a single "s/regexp/replacement/flag" instruction, if there's no extension with a flag and a syntax for accesing the individual elements the replacement string). -- ___ Python tracker <http://bugs.python.org/issue7132> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7132] Regexp: capturing groups in repetitions
Philippe Verdy added the comment: And anyway, my suggestion is certainly much more useful than atomic groups and possessive groups that have much lower use, and which are already being tested in Perl but that Python (or PCRE, PHP, and most implementations of 'vi'/'ed', or 'sed') still does not support. -- ___ Python tracker <http://bugs.python.org/issue7132> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7132] Regexp: capturing groups in repetitions
Philippe Verdy added the comment: I had read carefully ALL what ezio said, this is clear in the fact that I have summarized my responses to ALL the 4 points given by ezio. Capturing groups is a VERY useful feature of regular expressions, but they currently DON'T work as expected (in a useful way) when they are used within repetitions (unless you don't need any captures at all, for example when just using find(), and not performing substitutions on the groups. My proposal woul have absolutely NO performance impact when capturing groups are not used (find only, without replacement, so there the R flag can be safely ignored). It would also not affect the case where capturing groups are used in the regexp, but these groups are not referenced in the substitution or in the code using MatchObject.group(index) : these indexes are already not used (or should not, because this is most of the time a bug when it just returns the last occurence). Using multiple parsing operations with multiple regexps is really tricky, when all could be done directly from the original regexp, without modifying it. In addition, using split() or similar will not work as expected, when the splitting operations will not correctly parse the context in which the multiple occurences are safely separated (this context is only correctly specified in the original regexp where the groups, capturing or not, are specified). This extension will also NOT affect the non-capturing groups like: (?:X){m,n} (?:X)* (?:X)+ It will ONLY affect the CAPTURING groups like: (X){m,n} (X)* (X)+ and only if the R flag is set (in which case this will NOT affect the backtracking behavior, or which strings that will be effectively matched, but only the values of the returned "\n" indexed group. If my suggestion to keep the existing MatchObject.function(index) API looks too dangerous for you, because it would change the type of the returned values when the R flag is set, you can as well rename them to get a specific occurence of a group. Such as: MatchObject.groupOccurences(index) MatchObject.startOccurences(index) MatchObject.endOccurences(index) MatchObject.spanOccurences(index) MatchObject.groupsOccurences(index) But I don't think this is necessary; it will be already expected that they will return lists of values (or lists of pairs), instead of just single values (or single pairs) for each group: Python (as well as PHP or Perl) can already manage return values with varying datatypes. May be only PCRE (written for C/C++) would need a new API name to return lists of values instead of single values for each group, due to existing datatype restrictions. My proposal is not inconsistant: it returns consistant datatypes when the R flag is set, for ALL capturing groups (not just those that are repeated. Anyway I'll submit my idea to other groups, if I can find where to post them. Note that I've already implemented it in my own local implementation of PCRE, and this works perfectly with effectively very few changes (currently I have had to change the datatypes for matching objects so that they can return varying types), and I have used it to create a modified version of 'sed' to perform massive filtering of data: It really reduces the number of transformation steps needed to process such data correctly, because a single regexp (exactly the same that is already used in the first step used to match the substrings we are interested in, when using existing 'sed' implementations) can be used to perform the substitutions using indexes within captured groups. And I would like to have it incoporated in Python (and also Perl or PHP) as well. -- ___ Python tracker <http://bugs.python.org/issue7132> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7132] Regexp: capturing groups in repetitions
Philippe Verdy added the comment: ezio said: >>> re.match('^(\d{1,3})(?:\.(\d{1,3})){3}$', '192.168.0.1').groups() ('192', '1') > If I understood correctly what you are proposing, you would like it to return (['192'], ['168', '0', '1']) instead. Yes, exactly ! That's the correct answer that should be returned, when the R flag is set. > This will also require an additional step to join the two lists to get the list with the 4 values. Yes, but this is necessary for full consistency of the group indexes. The current return value is clearly inconsistant (generally it returns the last occurence of the capturing group, but I've discovered that this is not always the case, because of matches that are returned after backtracking...) It is then assumed that when the R flag is set, ALL occurences of repeated groups will be returned individually, instead of just a 'random' one. Note that for full generalization of the concept, there should even be lists of lists if a capturing group contains itself another inner capturing group (with its own index), in order to associate them correctly with each occurence of the outer capturing group (however I've still not experimented this in my local experimentation, so all occurences are grouped in the same returned list, independantly of the occurence of the outer capturing group in which they were found). -- ___ Python tracker <http://bugs.python.org/issue7132> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7132] Regexp: capturing groups in repetitions
Philippe Verdy added the comment: > That's why I wrote 'without checking if they are in range(256)'; the fact that this regex matches invalid digits was not relevant in my example (and it's usually easier to convert the digits to int and check if 0 <= digits <= 255). :) NO ! You have to check also the number of digits for values below 100 (2 digits only) or below 10 (1 digit only) And when processing web log files for example, or when parsing Wiki pages or emails in which you want to autodetect the presence of ONLY valid IP addresses within some contexts, where you want to transform them to another form (for example when converting them to links or to differentiate 'anonymous' users in wiki pages from registered named users, you need to correctly match these IP addresses. In addition, these files will often contain many other occurences that you don't want to transform, but just some of them in specific contexts given by the regexp. for this reason, your suggestion will often not work as expected. The real need is to match things exactly, within their context, and capturing all occurences of capturing groups. I gave the IPv4 regexp only as a simple example to show the need, but there are of course much more complex cases, and that's exactly for those cases that I would like the extension: using alternate code with partial matches and extra split() operations give a code that becomes tricky, and most often bogous. Only the original regexp is precise enough to parse the content correctly, find only the matches we want, and capturing all the groups that we really want, in a single operation, and with a near-zero cost (and without complication in the rest of the code using it). -- ___ Python tracker <http://bugs.python.org/issue7132> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7132] Regexp: capturing groups in repetitions
Philippe Verdy added the comment: > Even with your solution, in most of the cases you will need additional steps to assemble the results (at least in the cases with some kind of separator, where you have to join the first element with the followings). Yes, but this step is trivial and fully predictable. Much more viable than the other solutions proposed which gives tricky and often complex and bogous code. How many bugs have been found in code using split() for example to parse URLs ? There are countlesss in many softwares (and it is not terminated !) And in fine, the only solution is to simply rewrite the parser completely, without regexps at all, or to reduce the generality of the problems that the program was supposed to solve (i.e. asserting in the code some implementation limits, to reject some forms that were previously considered valid). Think about it, the capturing groups are the perfect solution for solving the problem cleanly, provided that they work as intended and return all their occurences. -- ___ Python tracker <http://bugs.python.org/issue7132> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7132] Regexp: capturing groups in repetitions
Philippe Verdy added the comment: > a "general" regex (e.g. for an ipv6 address) I know this problem, and I have already written about this. It is not possible to parse it in a single regexp if it is written without using repetitions. But in that case, the regexp becomes really HUGE, and the number of groups in the returned match object is prohibitive. That's why CPAN has had to write a specific module for IPv6 addresses in Perl. Such module can be reduced to just a couple of lines with a single regexp, if its capturing groups correctly return ALL their occurences in the regexp engine: it requires no further processing and analysis, and the data can effectively be reassembled cleanly, just from the returned groups (as lists): - \1 and \2 (for hex components of IPv6 in hex format only, where \1 can occur 0 or 1 time, and \2 can occur 0 to 7 times) - or from \1 to \2 and \3 to \4 (for hex components in \1..\2, where \1 occurs 0 or 1 time and \2 occurs 0 to 5 times, and for decimal components in \3..\4, where \3 occurs 1 time and \4 occurs exactly 3 times). -- ___ Python tracker <http://bugs.python.org/issue7132> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7132] Regexp: capturing groups in repetitions
Philippe Verdy added the comment: >> And anyway, my suggestion is certainly much more useful than atomic >> groups and possessive groups that have much lower use [...] >Then why no one implemented it yet? :) That's because they had to use something else than regexps to do their parsing. All those that had to do that *pested* that the regexps were not capturing all occurences. And then later they regretted it, because they had to fix their alternate code (such as those using the bogous split() alternatives...) and finally rewrote their own parsers (sometimes with a combination of (f)lex+yacc/bison, even if the full expression was given in a single regexp which was expressive enough to match only the exact match they wanted, but without using the returned captured groups): this means an extra parsing for the found substring (in their correct context) in order to process it. -- ___ Python tracker <http://bugs.python.org/issue7132> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7132] Regexp: capturing groups in repetitions
Philippe Verdy added the comment: >>> re.match('^(\d{1,3})(?:\.(\d{1,3})){3}$', '192.168.0.1').groups() ('192', '1') > If I understood correctly what you are proposing, you would like it to return (['192'], ['168', '0', '1']) instead. In fact it can be assembled in a single array directly in the regexp, by naming the destination capturing group (with the same name, it would get the same group index, instead of of allocating a new one). E.g., with someting like: >>> re.match('^(?P=\d{1,3})(?:\.(?P=\d{1,3})){3}$', '192.168.0.1').groups() would return ("parts": ['192', '168', '0', '1']) in the same first group. This could be used as well in PHP (which supports associative arrays for named groups which are also indexed positionnaly). -- ___ Python tracker <http://bugs.python.org/issue7132> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7132] Regexp: capturing groups in repetitions
Philippe Verdy added the comment: You said that this extension was not implemented anywhere, and you were wrong. I've found that it IS implemented in Perl 6! Look at this discussion: http://www.perlmonks.org/?node_id=602361 Look at how the matches in quantified capture groups are returned as arrayref's (i.e. references to a numbered array). So my idea is not stupid. Given that Perl rules the world of the Regexp language, it will be implemented elsewhere sonner or later, notably in PCRE, awk, vi, sed, PHP, .NET library... Already, this is used in CPAN libraries for Perl v6... (when the X flag is set for extensions) -- ___ Python tracker <http://bugs.python.org/issue7132> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7132] Regexp: capturing groups in repetitions
Philippe Verdy added the comment: Anyway, there are ways to speedup regexps, even without instructing the regexps with anti-backtracking syntaxes. See http://swtch.com/~rsc/regexp/regexp1.html (article dated January 2007) Which discusses how Perl, PCRE (and PHP), Python, Java, Ruby, .NET library... are slow because they are using backtracking a single state in the NFA, instead of using simultaneously active states (which correctly emulates the DFA, without having to actually build the DFA transition states table, which could grow combinatorially, as seen in yacc and Bison). Java uses now the Thomson approach in its latest releases, but I wonder how Python works: does it use the DFA simulation? Do you use PCRE? Note that I've been using the DFA simulation since more than 20 years in 1987, when I built my first regexp matcher (because the existing implementation at that time were really damn slow), after I read the Aho/Sethi/Ullman book which already demonstrated the algorithm. This algorithm has been implemented in some tools replacing the old yacc/Bison tools, because they generate huge DFA transition tables (and this was the reason why Lex and Yacc were maintained as separate tools, Lex using the single-state NFA approach with excessive backtracking, and Yacc/Bison trying to generate the full DFA transition tables.) : the first language to use this approach was the Purdue Univerity Compiler Construction Tools Set (PCCTS) which was initially written in C and is now fully written and supported in Java. The Perl 6 extension for quantified capturing groups will have a slow adoption, as long as Perl will continue the slow single-state NFA approach with excessive backtracking, instead of the Aho/Sethi/Ullman (ASU) approach (that some are attributing to Thomson due to the article in 2007, but this is false) using simultaneously active states. But anyway, it already exists (and Perl developers are already working on rewriting the engine using the ASU approach). But my suggstion is much more general, as it should not just apply to quantified capturing groups, but to any capturing group that is part of a subexpression which is quantified. And the way I specified it, it does not depend on the way the engine is written (whever it uses a single-state NFA or multi-state NFA or generates and uses a DFA transition state with single-state like in Yacc/Bison), because capturing groups are just used to store position pairs, and regexp engines already have to count them for effectively matching the greedy or non-greedy quantifiers, so this immediately provides a usable index for storing at successive positions in a numbered array for captured groups. The simple test case is effectively to try to match /(aa?)*a+/ with strings longer than a few dozens of 'a'. -- ___ Python tracker <http://bugs.python.org/issue7132> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7132] Regexp: capturing groups in repetitions
Philippe Verdy added the comment: Umm I saif that the attribution to Thompson was wrong, in fact it was correct. Thompson designed and documented the algorithm in 1968, long before the Aho/Seti/Ullman green book... so the algorithm is more than 40 years old, and still not in Python, Perl and PCRE (but it is present in GNU awk...) The paper published in swtch.com is effectively written in 2007, but its conclusions are perfectly valid. The interesting aspect of this paper is that it demonstrates that the Thompson's multi-state NFA approach is still the best one, and better than what Perl, PCRE (and PHP), Python, Ruby and others are using, but that it can be also optimized further by caching the DFA construction "on the fly" (see the blue curve on the displayed diagram) while parsing the the already precompiled multi-state NFA. The cache for DFA states will fill up while matching the regexp against actual strings, so it will be much faster (and much less memory-and- time-greedy than generating the full DFA transition table at compilation time like in Bison/Yacc). However the paper still does not discusses how to make the DFA states cache limited in size. Notably because the longer the input text will be, the more the DFA cache will contain DFA states. One simple rule is to limit the number of cached DFA states, and then to allow all stored transitions to go all multiple-states in the NFA, and optionally to a single DFA state in the cache. Then the DFA cache can be used in a LIFO manner, to purge it automatically from unused states, in order to reuse them (for caching another new DFA state which is still not present in the cache, when the cache has reached its maximum size): if this occurs, the other existing DFA states that point to it must be cleaned (their DFA state pointer or reference, stored in their NFA or DFA transitions, must be cleared/set to null, so that they will only contain the list of pointers to outgoing NFA states). The problem is how to look for a multistate in the DFA cache: this requires some fast lookup, but this can be implemented in a fast way using hash tables (by hashing the list of NFA states represented in the DFA state). Apparently, GNU awk does not use the cached DFA approach: it just uses the NFA directly when the input text is lower than two dozens of characters, then it builds the full DFA as soon as the input text becomes larger (this explains the sudden, but moderate increase of time). But I've seen that GNU awk has the defect of this unlimited DFA generation approach: its excessive use of memory when the input text increases, because the number of DFA states added to the cache is contantly growing with the input, and the time to match each characer from the input slowly increases too. At some point, it will crash and exit due to memory limits exhaustion, when no more DFA states can be stored. That's why the DFA cache MUST be maintained under some level. I'll try to implement this newer approach first in Java (just because I better know this language than Python, and beacause I think it is more solid in terms of type-safety, so it can reduce the number of bugs to correct before getting something stable). In Java, there's a clean way to automatically cleanup objects from collections, when they are no longer needed: you just need to use weak references (the garbage collector will automatically cleanup the older objects, when needed). But this approach is not easy to port, and in fact, if it can effectively solve some problems, it will still not forbif the cache to use up to the maximum VM size. for performance reasons, I see little interest in storing more than about 1 million DFA states in the cache (also because the hash table that would be used would be less efficient when looking up for the key of a NFA multi-state where the DFA state is stored). So I will probably not use weak references, but will first use a maximum size (even if weak references could help maintain the cache at even lower bounds than the allowed maximum, if VM memory size is more constrained: it is a good idea in all Java programs to allow caches introduced only for performance reasons to also collaborate with the garbage collector, in order to avoid the explosion of all caches used in various programs or libraries). I don't know if Python supports the concept of weak references for handling performance caches. May be I'll port it later in Python, but don't expect that I'll port it to C/C++ (as a replacement of PCRE), because I now hate these unsafe languages despite I have practiced them for many years: others would do that for me, when I'll have published my Java implementation. -- ___ Python tracker <http://bugs.python.org/issue7132> ___ __
[issue17821] Different division results with / and // operators with large numbers
New submission from Philippe Rouquier: Hi, the following statement yields different results in python2 and python3: 284397269195572115652769428988866694680//17 - int(284397269195572115652769428988866694680/17) In python3 it yields: 309657313492949847071 In python2 it yields: OL Python2's result seems to be correct as (284397269195572115652769428988866694680//17) and (int(284397269195572115652769428988866694680/17)) should return the same result (as far as I understand). With smaller numbers, this difference in results does not appear with python3. Note: I noticed this, while working on RSA; 284397269195572115652769428988866694680 is (p-1)(q-1) and 17 is e. I just mention this in case it could help. I used linux version 3.3.3.0 and 2.7.3 for the tests on a 64 bits processor. Sorry if I am missing something here. -- components: Interpreter Core messages: 187623 nosy: Philippe.Rouquier priority: normal severity: normal status: open title: Different division results with / and // operators with large numbers type: behavior versions: Python 3.3 ___ Python tracker <http://bugs.python.org/issue17821> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17821] Different division results with / and // operators with large numbers
Philippe Rouquier added the comment: Does your comment mean that this is bug should be closed as notabug since anyone wanting to avoid such rounding error should use // operator? -- ___ Python tracker <http://bugs.python.org/issue17821> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25045] smtplib throws exception TypeError: readline()
New submission from Philippe Lambotte: smtplib smtpserver.ehlo() will throw exception. The error message : Traceback (most recent call last): File "snippet.email.sendmail.py", line 34, in smtpserver.ehlo() File "/usr/lib/python3.2/smtplib.py", line 421, in ehlo (code, msg) = self.getreply() File "/usr/lib/python3.2/smtplib.py", line 367, in getreply line = self.file.readline(_MAXLINE + 1) TypeError: readline() takes exactly 1 positional argument (2 given) smtplib works with python 2.7, but not with 3.2 If I remove the passed parameter, it works in 3.2 : line = self.file.readline() -- components: email messages: 250317 nosy: barry, phlambotte, r.david.murray priority: normal severity: normal status: open title: smtplib throws exception TypeError: readline() type: behavior versions: Python 3.2 ___ Python tracker <http://bugs.python.org/issue25045> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25045] smtplib throws exception TypeError: readline()
Philippe Lambotte added the comment: The code is : #!/usr/bin/env python3 #-*- coding: utf-8 -*- import smtplib, os from email.mime.multipart import MIMEMultipart from email.mime.base import MIMEBase from email.mime.text import MIMEText from email.utils import COMMASPACE, formatdate from email import encoders EMAIL_FROM = 'emailf...@mywebsite.net' EMAIL_TO = ['emai...@mywebsite.net'] SENDER_LOGIN = 'mylogin' SENDER_PASSWORD = 'mypassword' SMTP_HOST = 'smtp.myhoster.net' SMTP_PORT = 587 SMTP_IS_STARTTLS = True SUBJECT ="This is the subject" TEXT = 'This is a test' FILES = list() # ['dummy_text.txt'] def send_mail_with_attachment( send_from, send_to, subject, text, files=[], server="localhost", port=587, username='', password='', isTls=True): msg = MIMEMultipart() msg['From'] = send_from msg['To'] = COMMASPACE.join(send_to) msg['Date'] = formatdate(localtime = True) msg['Subject'] = subject msg.attach( MIMEText(text) ) if len(files) > 0 : for f in files: part = MIMEBase('application', "octet-stream") part.set_payload( open(f,"rb").read() ) encoders.encode_base64(part) part.add_header('Content-Disposition', 'attachment; filename="{0}"'.format(os.path.basename(f))) msg.attach(part) smtp = smtplib.SMTP(server, port) if isTls: smtp.starttls() smtp.login(username,password) smtp.sendmail(send_from, send_to, msg.as_string()) smtp.quit() send_mail_with_attachment(EMAIL_FROM,EMAIL_TO,SUBJECT,TEXT,FILES,SMTP_HOST,SMTP_PORT,SENDER_LOGIN,SENDER_PASSWORD,SMTP_IS_STARTTLS) When I run it, I have the following message : Traceback (most recent call last): File "snippet.email.envoyer_un_mail_au_format_html.py", line 46, in send_mail_with_attachment(EMAIL_FROM,EMAIL_TO,SUBJECT,TEXT,FILES,SMTP_HOST,SMTP_PORT,SENDER_LOGIN,SENDER_PASSWORD,SMTP_IS_STARTTLS) File "snippet.email.envoyer_un_mail_au_format_html.py", line 42, in send_mail_with_attachment smtp.login(username,password) File "/usr/lib/python3.2/smtplib.py", line 595, in login self.ehlo_or_helo_if_needed() File "/usr/lib/python3.2/smtplib.py", line 554, in ehlo_or_helo_if_needed if not (200 <= self.ehlo()[0] <= 299): File "/usr/lib/python3.2/smtplib.py", line 421, in ehlo (code, msg) = self.getreply() File "/usr/lib/python3.2/smtplib.py", line 367, in getreply line = self.file.readline(_MAXLINE + 1) TypeError: readline() takes exactly 1 positional argument (2 given) -- ___ Python tracker <http://bugs.python.org/issue25045> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2078] CSV Sniffer does not function properly on single column .csv files
Changes by Jean-Philippe Laverdure: -- components: +Library (Lib) -Extension Modules versions: +Python 2.4 __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue2078> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2078] CSV Sniffer does not function properly on single column .csv files
New submission from Jean-Philippe Laverdure: When attempting to sniff() the dialect for the attached .csv file, csv.Sniffer.sniff() returns an unusable dialect: >>> import csv >>> file = open('listB2Mforblast.csv', 'r') >>> dialect = csv.Sniffer().sniff(file.readline()) >>> file.seek(0) >>> file.readline() >>> file.seek(0) >>> reader = csv.DictReader(file, dialect) >>> reader.next() Traceback (most recent call last): File "", line 1, in File "/soft/bioinfo/linux/python-2.5/lib/python2.5/csv.py", line 93, in next d = dict(zip(self.fieldnames, row)) TypeError: zip argument #1 must support iteration However, this works fine: >>> file.seek(0) >>> reader = csv.DictReader(file) >>> reader.next() {'Sequence': 'AALENTHLL'} If I use a 2 column file, sniff() works perfectly. It only seems to have a problem with single column .csv files (which are still .csv files in my opinion) Thanks for looking into this. -- components: Extension Modules files: listB2Mforblast.csv messages: 62319 nosy: jplaverdure severity: normal status: open title: CSV Sniffer does not function properly on single column .csv files type: behavior versions: Python 2.5 Added file: http://bugs.python.org/file9416/listB2Mforblast.csv __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue2078> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2078] CSV Sniffer does not function properly on single column .csv files
Jean-Philippe Laverdure <[EMAIL PROTECTED]> added the comment: Hello and sorry for the late reply. Wolfgang: sorry about my misuse of the csv.DictReader constructor, that was a mistake on my part. However, it still is not functionning as I think it should/could. Look at this: Using this content: Sequence AAGINRDSL AAIANHQVL and this piece of code: f = open(sys.argv[-1], 'r') dialect = csv.Sniffer().sniff(f.readline()) f.seek(0) reader = csv.DictReader(f, dialect=dialect) for line in reader: print line I get this result: {'Sequen': 'AAGINRDSL', 'e': None} {'Sequen': 'AAIANHQVL', 'e': None} When I really should be getting this: {'Sequence': 'AAGINRDSL'} {'Sequence': 'AAIANHQVL'} The fact is this code is in use in an application where users can submit a .csv file produced by Excel for treatment. The file must contain a "Sequence" column since that is what the treatment is run on. Now I had to make the following changes to my code to account for the fact that some users submit a single column file (since only the "Sequence" column is required for treatment): f = open(sys.argv[-1], 'r') try: dialect = csv.Sniffer().sniff(f.readline(), [',', '\t']) f.seek(0) reader = csv.DictReader(f, dialect=dialect) except: print '>>>caught csv sniff() exception' f.seek(0) reader = csv.DictReader(f) for line in reader: Do what I need to do Which really feels like a patched use of a buggy implementation of the Sniffer class I understand the issues raised by Skip in regards to figuring out a delimiter at all costs... But really, the Sniffer class should work apropriately when a single column .csv file is submitted __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue2078> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2078] CSV Sniffer does not function properly on single column .csv files
Jean-Philippe Laverdure <[EMAIL PROTECTED]> added the comment: Hi Skip, You're right, it does seem that using f.read(1024) to feed the sniffer works OK in my case and allows me to instantiate the DictReader correctly... Why that is I'm not sure though... I was submitting the first line as I thought is was the right sample to provide the sniffer for it to sniff the correct dialect regardless of the file format and file content. And yes, 'except csv.Error' is certainly a better way to trap my desired exception... I guess I'm a bit of a n00b using Python. Thanks for the help. Python really has a great community ! __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue2078> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45557] Issue 42914
New submission from Jean-Philippe VINCENT : Hello, I just tried the new attribute underscore_numbers with pprint, it doesn't work for me. I'm working on Windows. [cid:8779885d-01bf-4162-9427-a44de152f7ac] Best regards, Jean-Philippe -- files: image.png messages: 404636 nosy: jpvincent priority: normal severity: normal status: open title: Issue 42914 Added file: https://bugs.python.org/file50385/image.png ___ Python tracker <https://bugs.python.org/issue45557> __ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45557] pprint -> underscore_numbers argument not working
Change by Jean-Philippe VINCENT : -- title: Issue 42914 -> pprint -> underscore_numbers argument not working ___ Python tracker <https://bugs.python.org/issue45557> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31953] Dedicated place for security announcements?
New submission from Jean-Philippe Ouellet : Hello, My apologies if this is not the right place to discus this. I would like to ensure that I stay informed of any potential future security issues in python (specifically at least the cpython runtime and standard library, although select very-popular 3rd party libraries wouldn't hurt). I cannot find a single place where such announcements are guaranteed to land. Good examples of the type of thing I am looking for are the openssl-announce list [1][2] and the golang-announce list [3], where the projects pre-announce "Hey, we're going to have a release on which addresses a security issue in ." and then announces again when patches are available such that responsible maintainers (such as I am trying to be) can ensure that updates are available to our users ASAP. The python-announce-list [4] does not serve this purpose because it has lots of noise from initial release announcements about random 3rd party stuff, and the "security news" page [5] is really just a "how to disclose vulns" page. Note that I'm *not* advocating for the creation of a pre-disclosure list! Python is such a ubiquitous piece of software that I don't think it's reasonable to expect that such a list could contain all affected parties without also leaking details to those who would cause harm. I'm only asking for something public that I can subscribe to in order to be sure I'll have a heads up of when patching is imminently required. Regards, Jean-Philippe (a contributor to the Qubes OS project [6] whose security relies mostly on Python's and Xen's - and is on Xen's pre-disclosure list) [1]: https://mta.openssl.org/pipermail/openssl-announce/2017-October/thread.html [2]: https://mta.openssl.org/pipermail/openssl-announce/2017-November/thread.html [3]: https://groups.google.com/forum/#!forum/golang-announce [4]: https://mail.python.org/mailman/listinfo/python-announce-list [5]: https://www.python.org/news/security/ [6]: https://www.qubes-os.org/ -- assignee: docs@python components: Documentation, email messages: 305614 nosy: barry, docs@python, jpo, r.david.murray priority: normal severity: normal status: open title: Dedicated place for security announcements? type: security ___ Python tracker <https://bugs.python.org/issue31953> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31953] Dedicated place for security announcements?
Jean-Philippe Ouellet added the comment: Ah, I now see there actually *is* a security-announce list [1]! Unless one happens to already know that Python has two concurrent mailman instances hosting different lists [2][3], it's easy to miss. Thanks, and sorry for the noise! [1]: https://mail.python.org/mm3/archives/list/security-annou...@python.org/ [2]: https://mail.python.org/mm3/archives/ [3]: https://mail.python.org/mailman/listinfo -- ___ Python tracker <https://bugs.python.org/issue31953> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28547] Python to use Windows Certificate Store
Change by Jean-Philippe Landry : -- resolution: -> third party stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue28547> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28547] Python to use Windows Certificate Store
New submission from Jean-Philippe Landry: Hello, Would it be possible for Python to use the Certificate Store in windows instead of a predetermined list of certificates. The use case is as follows: Multiple machines being on a corporate network where there is a man in the middle packet inspection (IT security stuff...) that will resign most of the SSL connections with its own certificate that is unfortunately not part of the python default store. There are also multiple behind the firewall servers using self signed certificates. That means that almost all SSL requests, including pip install will throw the famous [SSL: CERTIFICATE_VERIFY_FAILED] error. This is transparent in Chrome because Chrome is using the Windows store to determine if a certificate is trusted or not and all those custom certificates are in the windows store. However, Python uses its own file (list of approved certificates). I understand that this can be overridden using a custom, manually managed, crt file and set it into the environment variables (REQUESTS_CA_BUNDLE) and it works. However, this involves manual operation and undesired maintenance when a new certificate will be added to the store. The windows store itself gets updated periodically by IT so it is a not an issue. Is there a rationale behind using a specific file instead of the windows store which will work for Chrome, IE, etc... Best regards, Jean-Philippe -- assignee: christian.heimes components: SSL messages: 279602 nosy: Jean-Philippe Landry, christian.heimes priority: normal severity: normal status: open title: Python to use Windows Certificate Store type: behavior versions: Python 3.5 ___ Python tracker <http://bugs.python.org/issue28547> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com