[issue46598] ElementTree: wrong XML prolog for the utf-8-sig encoding

2022-02-01 Thread Petr Prikryl


New submission from Petr Prikryl :

When ElementTree object is to be written to the file, and when BOM is needed, 
the 'utf-8-sig' can be used for the purpose. However, the XML prolog then looks 
like...



... and that encoding in the prolog makes no sense. Therefore,
the utf-8-sig is changed to utf-8 for the purpose.

To fix the situation, the following two lines should be added to
`cpython/Lib/xml/etree/ElementTree.py`

`elif enc_lower == "utf-8-sig":
 declared_encoding = "utf-8"
`

just above the line 741 that says 
`write("\n" % (
   declared_encoding,))`

I have already cloned the main branch, added the lines to 
`https://github.com/pepr/cpython.git`, and sent pull request.

I have tested the functionality locally with `Python 3.10.2 
(tags/v3.10.2:a58ebcc, Jan 17 2022, 14:12:15) [MSC v.1929 64 bit (AMD64)] on 
win32`

--
components: Library (Lib)
messages: 412247
nosy: prikryl
priority: normal
pull_requests: 29231
severity: normal
status: open
title: ElementTree: wrong XML prolog for the utf-8-sig encoding
versions: Python 3.10

___
Python tracker 
<https://bugs.python.org/issue46598>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1145257] shutil.copystat() may fail...

2013-01-27 Thread Petr Prikryl

Petr Prikryl added the comment:

Well, it is quite an old event. Anyway, I have fixed the simple example, and 
launched it on Python 2.6, 2.7, 3.2, 3.3. It does not fail now. But I did not 
tested it heavily.

>From my point of view, it was probably fixed.

--

___
Python tracker 
<http://bugs.python.org/issue1145257>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16322] time.tzname on Python 3.3.0 for Windows is decoded by wrong encoding

2015-09-18 Thread Petr Prikryl

Petr Prikryl added the comment:

I have just observed behaviour for the Czech locale. I tried to avoid 
collisions with stdout encoding, writing the strings into a file using UTF-8 
encoding:

tzname_bug.py
--
#!python3
import time
import sys
with open('tzname_bug.txt', 'w', encoding='utf-8') as f:
f.write(sys.version + '\n')
f.write('Should be: Střední Evropa (běžný čas) | Střední Evropa (letní 
čas)\n')
f.write('but it is: ' + time.tzname[0] + ' | ' + time.tzname[1] + '\n') 
   
f.write('types: ' + repr(type(time.tzname[0])) + ' | ' + 
repr(type(time.tzname[1])) + '\n')
f.write('Should be as ascii: ' + ascii('Střední Evropa (běžný čas) | 
Střední Evropa (letní čas)') + '\n')
f.write('but it is as ascii: ' + ascii(time.tzname[0]) + ' | ' + 
ascii(time.tzname[1]) + '\n')
---

It creates the tzname_bug.txt with the content (copy/pasted from 
UNICODE-capable editor (Notepad++, the indicator at the right bottom corner 
shows UTF-8.
---
3.5.0 (v3.5.0:374f501f4567, Sep 13 2015, 02:27:37) [MSC v.1900 64 bit (AMD64)]
Should be: Střední Evropa (běžný čas) | Střední Evropa (letní čas)
but it is: Støední Evropa (bìžný èas) | Støední Evropa (letní èas)
types:  | 
Should be as ascii: 'St\u0159edn\xed Evropa (b\u011b\u017en\xfd \u010das) | 
St\u0159edn\xed Evropa (letn\xed \u010das)'
but it is as ascii: 'St\xf8edn\xed Evropa (b\xec\x9en\xfd \xe8as)' | 
'St\xf8edn\xed Evropa (letn\xed \xe8as)'
---

--
nosy: +prikryl
Added file: http://bugs.python.org/file40507/tzname_bug.py

___
Python tracker 
<http://bugs.python.org/issue16322>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16322] time.tzname on Python 3.3.0 for Windows is decoded by wrong encoding

2015-09-19 Thread Petr Prikryl

Petr Prikryl added the comment:

I have worked around a bit differently -- the snippet from the code:

result = time.tzname[0]# simplified version of the original code.

# Because of the bug in Windows libraries, Python 3.3 tried to work around
# some issues. However, the shit hit the fan, and the bug bubbled here.
# The `time.tzname` elements are (unicode) strings; however, they were
# filled with bad content. See https://bugs.python.org/issue16322 for 
details.
# Actually, wrong characters were passed instead of the good ones.
# This code should be skipped later by versions of Python that will fix
# the issue.
import platform
if platform.system() == 'Windows':
# The concrete example for Czech locale:
# - cp1250 (windows-1250) is used as native encoding
# - the time.tzname[0] should start with 'Střední Evropa'
# - the ascii('Střední Evropa') should return "'St\u0159edn\xed Evropa'"
# - because of the bug it returns "'St\xf8edn\xed Evropa'"
#
# The 'ř' character has unicode code point `\u0159` (that is hex)
# and the `\xF8` code in cp1250. The `\xF8` was wrongly used
# as a Unicode code point `\u00F8` -- this is for the Unicode
# character 'ø' that is observed in the string.
#
# To fix it, the `result` string must be reinterpreted with a different
# encoding. When working with Python 3 strings, it can probably
# done only through the string representation and `eval()`. Here
# the `eval()` is not very dangerous because the string was obtained
# from the OS library, and the values are limited to certain subset.
#
# The `ascii()` literal is prefixed by `binary` type prefix character,
# `eval`uated, and the binary result is decoded to the correct string.
local_encoding = locale.getdefaultlocale()[1]
b = eval('b' + ascii(result))
result = b.decode(local_encoding)

--

___
Python tracker 
<http://bugs.python.org/issue16322>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16322] time.tzname on Python 3.3.0 for Windows is decoded by wrong encoding

2015-09-21 Thread Petr Prikryl

Petr Prikryl added the comment:

@eryksun: I see. In my case, I can set the locale before importing the time 
module. However, the code (asciidoc3.py) will be used as a module, and I cannot 
know if the user imported the time module or not.

Instead of your suggestion 
result = result.encode('latin-1').decode('mbcs')

I was thinking to create a module say wordaround16322.py like this:

---
import locale
locale.setlocale(locale.LC_ALL, '')

import importlib
import time
importlib.reload(time)
---

I thought that reloading the time module would be the same as importing is 
later, after setting locale. If that worked, the module could be simply 
imported wherever it was needed. However, it does not work when imported after 
importing time. What is the reason? Does reload() work
only for modules coded as Python sources? Is there any other approach that 
would implement the workaroundXXX.py module?

--

___
Python tracker 
<http://bugs.python.org/issue16322>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16322] time.tzname on Python 3.3.0 for Windows is decoded by wrong encoding

2015-09-22 Thread Petr Prikryl

Petr Prikryl added the comment:

@eryksun: Thanks for your help. I have finaly ended with your...

"Call setlocale(LC_CTYPE, ''), and then call time.strftime('%Z') to get the 
timezone name."

--

___
Python tracker 
<http://bugs.python.org/issue16322>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com