[issue10551] mimetypes reading from registry in windows completely broken
New submission from Kovid Goyal : Hi, I am the primary developer of calibre (http:/calibre-ebook.com) and yesterday I released an upgrade of calibre based on python 2.7. Here is a small sampling of all the diverse errors that my users experienced, related to reading mimetypes from the registry: 1. Permission denied if running from non privileged account Traceback (most recent call last): File "site.py", line 103, in main File "site.py", line 84, in run_entry_point File "site-packages\calibre\__init__.py", line 31, in File "mimetypes.py", line 344, in add_type File "mimetypes.py", line 355, in init File "mimetypes.py", line 261, in read_windows_registry WindowsError: [Error 5] Acceso denegado (Access not allowed) The fix for this is to trap WindowsError and ignore it in mimetypes.py 2. Mishandling of encoding of registry entries Traceback (most recent call last): File "site.py", line 103, in main File "site.py", line 84, in run_entry_point File "site-packages\calibre\__init__.py", line 31, in File "mimetypes.py", line 344, in add_type File "mimetypes.py", line 355, in init File "mimetypes.py", line 260, in read_windows_registry File "mimetypes.py", line 250, in enum_types UnicodeDecodeError: 'utf8' codec can't decode byte 0xe0 in position 0: invalid continuation byte The fix for this is to change except UnicodeEncodeError to except ValueError 3. python -c "import mimetypes; print mimetypes.guess_type('img.jpg')" ('image/pjpeg', None) Where the output should have been (image/jpeg', None) The fix for this is to load the registry entries before the default entris defined in mimetypes.py Of course, IMHO, the best possible fix is to simply remove the reading of mimetypes from the registry. But that is up to whoever maintains this module. Duplicate (less comprehensive) tickets ont his isuue in your traceker already are: 9291, 10490, 104314 If the maintainer of this module is unable to fix these issues, let me know and I will submit a patch, either removing _winreg or fixing the issues individually. -- components: Library (Lib) messages: 122542 nosy: kovid priority: normal severity: normal status: open title: mimetypes reading from registry in windows completely broken versions: Python 2.7 ___ Python tracker <http://bugs.python.org/issue10551> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10551] mimetypes reading from registry in windows completely broken
Kovid Goyal added the comment: And what about the third issue? Allow me to elaborate: mimetypes are a relatively standard set of mappings from well known file extensions to MIME descriptors. Reading mimetype mappings from the registry, a location that is writable to by random programs the user may have installed on his machine, let alone malware, is a BAD idea. It leads to situations like asking for the mimetype of file.jpg and getting iage/pjpeg back. Or asking for the mimetype of file.png and getting image/x-png back. If you still consider it good to read mimetypes from the registry, at the very least, they should be read before the standard mimetype mappings defined in mimetypes.py are applied. That way at least for that set of mappings, users of python can be assured of sane query results. As it stands now, mimetypes.py is useless and to workaround the problem I essentially had to define the mimetype mappings for all the mimetypes my program knows about by hand. -- resolution: duplicate -> status: closed -> open ___ Python tracker <http://bugs.python.org/issue10551> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10551] mimetypes read from the registry should not overwrite standard mime mappings
Kovid Goyal added the comment: I apologize for the multiple issue in the ticket. To my mind they were all basically one issue, stemming from the decision to read mimetypes from the registry. Since there are other tickets for the first two issues, I'll change the summary for this issue to reflect only the third. -- title: mimetypes reading from registry in windows completely broken -> mimetypes read from the registry should not overwrite standard mime mappings ___ Python tracker <http://bugs.python.org/issue10551> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10551] mimetypes read from the registry should not overwrite standard mime mappings
Kovid Goyal added the comment: It is, of course, your decision, but IMO, since the mimetypes database in windows appears to be always broken, the default behavior of the mimetypes module in python 2.7 on windows is broken for most (all?) windows installs. For me personally, it doesn't matter anymore, as I have already fixed calibre, but it would be surprising/unexpected behavior for someone new to using mimetypes.py on windows. Certainly, my expectation (perhaps naively) was that guess_type('image.jpg') would always return 'image/jpeg'. Users on windows rarely (ever?) modify the registry to change mimetypes. The only thing that does change mimetypes is installed software, without the users' knowledge/consent. So treating the registry as a reliable store of mime information, is not a good idea. On unix, the knownfiles are system files. I dont know about OS X, but on linux, since most software is installed by package managers, the package managers usually have policies that prevent application installs from clobbering system files. And of course, running userland applications dont have the necessary privileges to modify the files. Out of curiosity, what is the upside of reading mimetypes from the registry, given that it's information cannot be trusted? And you're most welcome, for calibre :) -- ___ Python tracker <http://bugs.python.org/issue10551> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10551] mimetypes read from the registry should not overwrite standard mime mappings
Kovid Goyal added the comment: I actually had in mind people that (like me) develop primarily on unix and assume that mimetypes works the same way on both windows and unix. Of course, the changed behavior is also a concern. At the very least, I would encourage the addition of a warning to the documentation of the mimetypes module. -- status: pending -> open ___ Python tracker <http://bugs.python.org/issue10551> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38828] cookiejar.py broken in 3.8
New submission from Kovid Goyal : In python 3.8 cookiejar.py is full of code that compares cookie.version to integers, which raises as exception when cookie.version is None. For example, in set_ok_version() and set_ok_path(). Both the Cookie constructor and _cookie_from_cookie_tuple() explicitly assume version can be None and setting version to None worked fine in previous pythonreleases. -- components: Library (Lib) messages: 356797 nosy: kovid priority: normal severity: normal status: open title: cookiejar.py broken in 3.8 type: crash versions: Python 3.8 ___ Python tracker <https://bugs.python.org/issue38828> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38828] cookiejar.py broken in 3.8
Kovid Goyal added the comment: The issue is obvious with a simple glance at the code. Either the Cookie constructor needs to change version = None to zero or some other integer or the various methods in that module need to handle a None version. I dont personally care about this issue any more since I have worked around it in my code, feel free to fix it or not, as you wish. -- ___ Python tracker <https://bugs.python.org/issue38828> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38828] http.cookiejar handle cookie.version to be None
Kovid Goyal added the comment: It's trivially True that it is a regression from python 2 since in python 2 comparison to None is fine. Whether it ever worked in any python 3 version before 3.8 I'm not sure about. -- ___ Python tracker <https://bugs.python.org/issue38828> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38828] http.cookiejar handle cookie.version to be None
Kovid Goyal added the comment: Here's a trivial script to reproduce: from urllib.request import Request from http.cookiejar import Cookie, CookieJar jar = CookieJar() jar.set_cookie(Cookie( None, 'test', 'test', None, False, '.test.com', True, False, '/', True, False, None, False, None, None, None )) r = Request('http://www.test.com') jar.add_cookie_header(r) -- ___ Python tracker <https://bugs.python.org/issue38828> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16512] imghdr doesn't support jpegs with an ICC profile
Kovid Goyal added the comment: FYI, the test I currently use in calibre, which has not failed so far for millions of users: def test_jpeg(h, f): if (h[6:10] in (b'JFIF', b'Exif')) or (h[:2] == b'\xff\xd8' and (b'JFIF' in h[:32] or b'8BIM' in h[:32])): return 'jpeg' -- ___ Python tracker <http://bugs.python.org/issue16512> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16512] imghdr doesn't recognize variant jpeg formats
Kovid Goyal added the comment: You cannot assume the file like object passed to imghdr is seekable. And IMO it is not the job of imghdr to check file validity, especially since it does not do that for all formats. -- ___ Python tracker <http://bugs.python.org/issue16512> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16512] imghdr doesn't support jpegs with an ICC profile
Kovid Goyal added the comment: The attached patch is insufficient, for example, it fails on http://nationalpostnews.files.wordpress.com/2013/03/budget.jpeg?w=300&h=1571 Note that the linux file utility identifies a files as "JPEG Image data" if the first two bytes of the file are \xff\xd8. A slightly stricter test that catches more jpeg files: def test_jpeg(h, f): if (h[6:10] in (b'JFIF', b'Exif')) or (h[:2] == b'\xff\xd8' and b'JFIF' in h[:32]): return 'jpeg' -- nosy: +kovid ___ Python tracker <http://bugs.python.org/issue16512> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15500] Python should support naming threads
Kovid Goyal added the comment: Just FYI, a pure python2 implementation that monkey patches Thread.start() to set the OS level thread name intelligently. import ctypes, ctypes.util, threading libpthread_path = ctypes.util.find_library("pthread") if libpthread_path: libpthread = ctypes.CDLL(libpthread_path) if hasattr(libpthread, "pthread_setname_np"): pthread_setname_np = libpthread.pthread_setname_np pthread_setname_np.argtypes = [ctypes.c_void_p, ctypes.c_char_p] pthread_setname_np.restype = ctypes.c_int orig_start = threading.Thread.start def new_start(self): orig_start(self) try: name = self.name if not name or name.startswith('Thread-'): name = self.__class__.__name__ if name == 'Thread': name = self.name if name: if isinstance(name, unicode): name = name.encode('ascii', 'replace') ident = getattr(self, "ident", None) if ident is not None: pthread_setname_np(ident, name[:15]) except Exception: pass # Don't care about failure to set name threading.Thread.start = new_start -- nosy: +kovid ___ Python tracker <http://bugs.python.org/issue15500> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25759] Python 2.7.11rc1 not building with Visual Studio 2015
New submission from Kovid Goyal: The Pcbuild/readme.txt file implies that it is possible to build python 2.7.11rc1 with Visual Studio 2015 (although it is not officially supported). However, there are at least a couple of problems, that I have encountered so far: 1) timemodule.c uses timezone, tzname and daylight which are no longer defined in visual studio, as a quick hackish workaround, one can do #if defined _MSC_VER && MSC_VER >= 1900 #define timezone _timezone #define tzname _tzname #define daylight _daylight #endif 2) More serious, the code in posixmodule.c to check if file descriptors are valid no longer links, since it relies on an internal structure from microsoft ddls, __pioinfo that no longer exists. See https://bugs.python.org/issue23524 for discussion about this in the python 3.x branch As a quick and dirty fix one could just replace _PyVerify_fd with a stub implementation that does nothing for _MSC_VER >= 1900 However, a proper fix should probably be made. -- components: Interpreter Core messages: 20 nosy: kovidgoyal priority: normal severity: normal status: open title: Python 2.7.11rc1 not building with Visual Studio 2015 type: compile error versions: Python 2.7 ___ Python tracker <http://bugs.python.org/issue25759> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25759] Python 2.7.11rc1 not building with Visual Studio 2015
Kovid Goyal added the comment: OK, I had hoped to avoid having to maintain my own fork of python 2 for a while longer, but, I guess not. Could you at least tell me if there are any other issues I should be aware of, to avoid me having to search through the python 3 sourcecode/commit history. I will be happy to make my work public so others can benefit from it as well. -- ___ Python tracker <http://bugs.python.org/issue25759> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25759] Python 2.7.11rc1 not building with Visual Studio 2015
Kovid Goyal added the comment: I have it building with just two simple patches: https://github.com/kovidgoyal/cpython/commit/fd1ceca4f21135f12ceb72f37d4ac5ea1576594d https://github.com/kovidgoyal/cpython/commit/edb740218c04b38aa0f385188103100a972d608c However, in developing the patches, I discovered what looks like a bug in the CRT close() function. If you double close a valid file descriptor it crashes, rather than calling the invalid parameter handler. python -c "import os; os.close(2); os.close(2)" crashes. This is true for python 2.7.10 built against VS 2008 as well. This contrasts with the behavior of double close() on other operating systems, where it sets errno to EBADF and does not crash. I have not tested it with python 3.5, but I assume the bug is present there as well. -- components: -Build, Windows ___ Python tracker <http://bugs.python.org/issue25759> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25759] Python 2.7.11rc1 not building with Visual Studio 2015
Kovid Goyal added the comment: I missed a few places in my initial patch, updated patch: https://github.com/kovidgoyal/cpython/commit/a9ec814d466d3c0139d10b69666f88eed10e4940 Also fixed the code not clearing errno before calling CRT functions, while I was there. Regardless of whether you want to allow your fork to be compiled with VS 2015 or not, I suggest you consider merging this patch, anyway, since the errno clearing is the correct thing to do, regardless. You can always cherrypick the errno clearing bits if you like :) Just FYI, the code in my fork of 2.7 passes all tests on 64bit builds with VS 2015, except for 5 small ones that I have yet to track down. (test_ctypes test_distutils test_gzip test_mailbox test_zipfile) I dont anticipate any difficulty in fixing the remaining test failures. Famous last words ;) -- ___ Python tracker <http://bugs.python.org/issue25759> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25759] Python 2.7.11rc1 not building with Visual Studio 2015
Kovid Goyal added the comment: Yes, I am aware. I embed python in my application, which includes large C++ libraries. Those libraries are going to start requiring to be compiled with a modern compiler soon, which means I need python to also be compiled with a modern compiler. I already manually compile all python extensions in my build system, so that is not a problem. And before someone suggests I upgrade to python 3, porting half a million lines of python is simply not worth it for me. I'll be happy to open a separate bug report, but first I want some advice. I have got all the other tests passing as well, except one single test. test_gzip.test_many_append. The reason that test fails is apparently because of a buffering bug in the stdio C functions in VS 2015. Combining lots of seeks relative to SEEK_CUR causes read() to return incorrect data. I can make the test pass by modify the gzip module to open files with bufferring=0, or by putting in a seek(0, 0) to cause the stdio layer to flush its read buffer at the appropriate point. However, this is not an actual fix, just an inefficient workaround. My question is, how do I properly workaround this bug? And how come this bug is not triggered in Python 3.5.0? Am I diagnosing this correctly? Any other alternative explanations? -- ___ Python tracker <http://bugs.python.org/issue25759> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25759] Python 2.7.11rc1 not building with Visual Studio 2015
Kovid Goyal added the comment: To answer part of my question, the reason the fseek()+fread() bug does not affect python 3.5.0 appears to be because it implements its own buffering and does not use fseek()/fread() at all. Sigh, I really hope the answer does not end up being that I have to re-implement fseek()/ftell()/fread()/fwrite() using lseek()/read()/write() on windows. Or I could wait and hope Microsoft fixes the bug :) As a first step, to confirm that the bug is in the CRT, I'll have the gzip module record all reads/seeks/tells and then see if I can reproduce the bug in a plain C program. -- ___ Python tracker <http://bugs.python.org/issue25759> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25759] Python 2.7.11rc1 not building with Visual Studio 2015
Kovid Goyal added the comment: Doesn't seem like a bug in the CRT, I cannot reproduce in a plain CRT program, so now I get to try to figure out what is broken in fileobject.c by VS 2015. That's a relief :) -- ___ Python tracker <http://bugs.python.org/issue25759> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25759] Python 2.7.11rc1 not building with Visual Studio 2015
Kovid Goyal added the comment: I take it back, my methodology in reproducing the function calls used by the gzip module was flawed. It does look like a bug in the CRT, but I have not been able to isolate a simple way of reproducing it. I have however, found a workaround for it, that has an acceptable performance impact. https://github.com/kovidgoyal/cpython/commit/72ae720ab057b1ac0402d67a7195d575d34afbbd Now all tests pass (except for tcl/tk and distutils, neither of which I care about -- well I will probably need to fix up distutils at some point, but not now :). Running testsuite as ./PCbuild/amd64/python_d.exe Lib/test/regrtest.py -u network,cpu,subprocess,urlfetch @steve: Thank you for all the work you did porting python 3.x to VS 2015, that certainly made by life a lot easier. I would of course, be ecstatic if you were to consider merging my work into the python 2.7 branch, but if not, I understand -- no one likes to maintain a legacy codebase. In any case, for interested third parties, my work is available here: https://github.com/kovidgoyal/cpython (2.7 branch) and instructions on building python on windows using a nice cygwin environment are here: https://github.com/kovidgoyal/calibre/blob/master/setup/installer/windows/notes2.rst -- ___ Python tracker <http://bugs.python.org/issue25759> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25759] Python 2.7.11rc1 not building with Visual Studio 2015
Kovid Goyal added the comment: No worries, as I said, I understand, I would probably do the same, were I in your shoes. I have found that being a maintainer of a complex software project tends to naturally increase conservatism :) -- ___ Python tracker <http://bugs.python.org/issue25759> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28591] imghdr doesn't recognize some jpeg formats
Kovid Goyal added the comment: FYI, the uptodate version of imghdr I maintain is here: https://github.com/kovidgoyal/calibre/blob/master/src/calibre/utils/imghdr.py It uses memoryview for performance and can also also read image sizes from file headers for jpeg, png, gif and jpeg2000. Note that is is only tested on python 2.7 I'm afraid I dont have the time to shepherd it through your review process, but feel free to take code from it if you want to. It is licensed GPLv3 but I am willing to re-license to another license if needed, as I am the sole contributor. -- ___ Python tracker <http://bugs.python.org/issue28591> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com