[issue46598] ElementTree: wrong XML prolog for the utf-8-sig encoding
New submission from Petr Prikryl : When ElementTree object is to be written to the file, and when BOM is needed, the 'utf-8-sig' can be used for the purpose. However, the XML prolog then looks like... ... and that encoding in the prolog makes no sense. Therefore, the utf-8-sig is changed to utf-8 for the purpose. To fix the situation, the following two lines should be added to `cpython/Lib/xml/etree/ElementTree.py` `elif enc_lower == "utf-8-sig": declared_encoding = "utf-8" ` just above the line 741 that says `write("\n" % ( declared_encoding,))` I have already cloned the main branch, added the lines to `https://github.com/pepr/cpython.git`, and sent pull request. I have tested the functionality locally with `Python 3.10.2 (tags/v3.10.2:a58ebcc, Jan 17 2022, 14:12:15) [MSC v.1929 64 bit (AMD64)] on win32` -- components: Library (Lib) messages: 412247 nosy: prikryl priority: normal pull_requests: 29231 severity: normal status: open title: ElementTree: wrong XML prolog for the utf-8-sig encoding versions: Python 3.10 ___ Python tracker <https://bugs.python.org/issue46598> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1145257] shutil.copystat() may fail...
Petr Prikryl added the comment: Well, it is quite an old event. Anyway, I have fixed the simple example, and launched it on Python 2.6, 2.7, 3.2, 3.3. It does not fail now. But I did not tested it heavily. >From my point of view, it was probably fixed. -- ___ Python tracker <http://bugs.python.org/issue1145257> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16322] time.tzname on Python 3.3.0 for Windows is decoded by wrong encoding
Petr Prikryl added the comment: I have just observed behaviour for the Czech locale. I tried to avoid collisions with stdout encoding, writing the strings into a file using UTF-8 encoding: tzname_bug.py -- #!python3 import time import sys with open('tzname_bug.txt', 'w', encoding='utf-8') as f: f.write(sys.version + '\n') f.write('Should be: Střední Evropa (běžný čas) | Střední Evropa (letní čas)\n') f.write('but it is: ' + time.tzname[0] + ' | ' + time.tzname[1] + '\n') f.write('types: ' + repr(type(time.tzname[0])) + ' | ' + repr(type(time.tzname[1])) + '\n') f.write('Should be as ascii: ' + ascii('Střední Evropa (běžný čas) | Střední Evropa (letní čas)') + '\n') f.write('but it is as ascii: ' + ascii(time.tzname[0]) + ' | ' + ascii(time.tzname[1]) + '\n') --- It creates the tzname_bug.txt with the content (copy/pasted from UNICODE-capable editor (Notepad++, the indicator at the right bottom corner shows UTF-8. --- 3.5.0 (v3.5.0:374f501f4567, Sep 13 2015, 02:27:37) [MSC v.1900 64 bit (AMD64)] Should be: Střední Evropa (běžný čas) | Střední Evropa (letní čas) but it is: Støední Evropa (bìný èas) | Støední Evropa (letní èas) types: | Should be as ascii: 'St\u0159edn\xed Evropa (b\u011b\u017en\xfd \u010das) | St\u0159edn\xed Evropa (letn\xed \u010das)' but it is as ascii: 'St\xf8edn\xed Evropa (b\xec\x9en\xfd \xe8as)' | 'St\xf8edn\xed Evropa (letn\xed \xe8as)' --- -- nosy: +prikryl Added file: http://bugs.python.org/file40507/tzname_bug.py ___ Python tracker <http://bugs.python.org/issue16322> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16322] time.tzname on Python 3.3.0 for Windows is decoded by wrong encoding
Petr Prikryl added the comment: I have worked around a bit differently -- the snippet from the code: result = time.tzname[0]# simplified version of the original code. # Because of the bug in Windows libraries, Python 3.3 tried to work around # some issues. However, the shit hit the fan, and the bug bubbled here. # The `time.tzname` elements are (unicode) strings; however, they were # filled with bad content. See https://bugs.python.org/issue16322 for details. # Actually, wrong characters were passed instead of the good ones. # This code should be skipped later by versions of Python that will fix # the issue. import platform if platform.system() == 'Windows': # The concrete example for Czech locale: # - cp1250 (windows-1250) is used as native encoding # - the time.tzname[0] should start with 'Střední Evropa' # - the ascii('Střední Evropa') should return "'St\u0159edn\xed Evropa'" # - because of the bug it returns "'St\xf8edn\xed Evropa'" # # The 'ř' character has unicode code point `\u0159` (that is hex) # and the `\xF8` code in cp1250. The `\xF8` was wrongly used # as a Unicode code point `\u00F8` -- this is for the Unicode # character 'ø' that is observed in the string. # # To fix it, the `result` string must be reinterpreted with a different # encoding. When working with Python 3 strings, it can probably # done only through the string representation and `eval()`. Here # the `eval()` is not very dangerous because the string was obtained # from the OS library, and the values are limited to certain subset. # # The `ascii()` literal is prefixed by `binary` type prefix character, # `eval`uated, and the binary result is decoded to the correct string. local_encoding = locale.getdefaultlocale()[1] b = eval('b' + ascii(result)) result = b.decode(local_encoding) -- ___ Python tracker <http://bugs.python.org/issue16322> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16322] time.tzname on Python 3.3.0 for Windows is decoded by wrong encoding
Petr Prikryl added the comment: @eryksun: I see. In my case, I can set the locale before importing the time module. However, the code (asciidoc3.py) will be used as a module, and I cannot know if the user imported the time module or not. Instead of your suggestion result = result.encode('latin-1').decode('mbcs') I was thinking to create a module say wordaround16322.py like this: --- import locale locale.setlocale(locale.LC_ALL, '') import importlib import time importlib.reload(time) --- I thought that reloading the time module would be the same as importing is later, after setting locale. If that worked, the module could be simply imported wherever it was needed. However, it does not work when imported after importing time. What is the reason? Does reload() work only for modules coded as Python sources? Is there any other approach that would implement the workaroundXXX.py module? -- ___ Python tracker <http://bugs.python.org/issue16322> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16322] time.tzname on Python 3.3.0 for Windows is decoded by wrong encoding
Petr Prikryl added the comment: @eryksun: Thanks for your help. I have finaly ended with your... "Call setlocale(LC_CTYPE, ''), and then call time.strftime('%Z') to get the timezone name." -- ___ Python tracker <http://bugs.python.org/issue16322> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com