STINNER Victor added the comment:
> What do you gain with this patch? (i.e. what is its advantage?)
You know directly that os.listdir(bytes) is unable to encode the filename,
instead of manipulate an invalid filename (b'?') and get the error later (when
you use the filenam
STINNER Victor added the comment:
> FindFirst/NextFileA will also do some other interesting conversions,
> such as the best-fit conversion (which the "mbcs" code doesn't do
> (anymore?)).
About mbcs, mbcs codec of Python 3.1 is like .encode('mbcs', 'repl
STINNER Victor added the comment:
It remembers me the discussion of the issue #3187. About unencodable filenames,
Guido proposed to ignore them or to use errors="replace", and wrote "Failing
the entire os.listdir() call is not acceptable". (... long discussion ...) And
STINNER Victor added the comment:
> FindFirst/NextFileA will also do some other interesting conversions,
> such as the best-fit conversion (which the "mbcs" code doesn't do
> (anymore?)).
If we choose to keep this behaviour, I will have to revert my commit on mbcs
cod
STINNER Victor added the comment:
> I fail to see why removing incorrect file names from the result
> list is any better than keeping them. The result list will
> be incorrect either way.
It depends if you focus on displaying the content of the directory, or on
processing
STINNER Victor added the comment:
> I think trying to emulate, in Python, what the *A functions
> do is futile.
My problem is that some functions will use mbcs in strict mode (functions using
PyUnicode_EncodeFSDefault): raise UnicodeEncodeError, and other will use mbcs
in replac
STINNER Victor added the comment:
- ignore unencodable filenames is not a good idea
- raise an error on unencodable filenames breaks backward compatibility
- I don't think that emit a warning will change anything
Even if I don't like mbcs+replace (current behaviour of os.listdir(
Changes by STINNER Victor :
Removed file: http://bugs.python.org/file18841/test_pep277.patch
___
Python tracker
<http://bugs.python.org/issue767645>
___
___
Python-bug
STINNER Victor added the comment:
r84784 sets os.path.supports_unicode_filenames to True on Mac OS X (macpath
module).
About test_supports_unicode_filenames.patch. test_unicode_listdir() is wrong:
os.listdir(str) always return str (see r84701). "verify that the new file's
name i
STINNER Victor added the comment:
I backported r84701 and r84784 to Python 2.7 (r84787).
--
___
Python tracker
<http://bugs.python.org/issue767645>
___
___
Pytho
Changes by STINNER Victor :
--
resolution: -> fixed
status: open -> closed
___
Python tracker
<http://bugs.python.org/issue9819>
___
___
Python-bugs-list
STINNER Victor added the comment:
> There seems to be some confusion about the macpath.py module. (...)
Oops. I thought that Mac OS X uses macpath, but in fact it is posixpath. Can
you try my new patch posixpath_darwin.patch? I reopen the issue because I
patched the wrong module. I supp
STINNER Victor added the comment:
The solution may be different depending on Python version. I propose to keep
macpath in Python 2.7, just because it's too late to change such thing in
Python2. But we may mark macpath as deprecated, eg. "macpath will be removed in
Python 3.2"
STINNER Victor added the comment:
For non-ascii directory name but ascii locale (eg. C locale), we have 3 choices:
a- read Makefile as a binary file
b- use the PEP 383
c- refuse to compile
(a) doesn't seem easy because it looks like distutils use the unicode type for
all path
STINNER Victor added the comment:
Warning: "use the PEP 383" may impact other distutils component because the
path may be written into to other files, which mean that we have to use
errors='surrogateescape' for these files too.
--
__
Changes by STINNER Victor :
--
nosy: +haypo
___
Python tracker
<http://bugs.python.org/issue8998>
___
___
Python-bugs-list mailing list
Unsubscribe:
STINNER Victor added the comment:
> No problems noted with a quick test of posixpath_darwin.patch
> on 10.6 so looks good.
Ok thanks. Fix commited to 3.2 (r84866) and 2.7 (r84868). I kept my patch on
macpath (supports_unicode_filenames=True) because it is still valid (even if it
is no
STINNER Victor added the comment:
I don't see any test_warnings anymore on
http://code.google.com/p/bbreport/wiki/PythonBuildbotReport. Close this issue.
--
status: open -> closed
___
Python tracker
<http://bugs.python.or
Changes by STINNER Victor :
--
nosy: +haypo
___
Python tracker
<http://bugs.python.org/issue4661>
___
___
Python-bugs-list mailing list
Unsubscribe:
STINNER Victor added the comment:
New version of the patch:
- reencode sys.path_importer_cache (and remove the last FIXME)
- fix different reference leaks
- catch PyIter_Next() failures
- create a subfunction to reencode sys.modules: it's easier to review and
manager errors in sh
Changes by STINNER Victor :
Removed file: http://bugs.python.org/file18561/reencode_modules_path-2.patch
___
Python tracker
<http://bugs.python.org/issue9630>
___
___
STINNER Victor added the comment:
> I would rename the feature to something like "redecode-modules"
Yes, right. I will rename the functions before commiting the patch.
--
___
Python tracker
<http://bugs.pyth
STINNER Victor added the comment:
> Why is this needed ?
Short answer: to support filesystem encoding different than utf-8. See #8611
for a longer explanation.
Example:
$ pwd
/home/SHARE/SVN/py3ké
$ PYTHONFSENCODING=ascii ./python test_fs_encoding.py
Fatal Python error: Py_Initial
STINNER Victor added the comment:
> Not sure it's related, but there seems to be a bug:
It's not a bug, it's a feature :-) If you specify a non-existing locale, the
GNU libc fails back to ascii.
$ locale -a
C
français
french
fr_FR
fr...@euro
fr_FR.iso88591
fr_fr.iso885.
STINNER Victor added the comment:
> Some things about your patch:
> - as Amaury said, functions should be named "redecode*"
> rather than "reencode*"
Yes, as written before (msg117269), I will do it in my next patch.
> - please use -1 for error return, n
STINNER Victor added the comment:
Le vendredi 24 septembre 2010 14:35:29, Marc-Andre Lemburg a écrit :
> Thanks for the explanation. So the only reason why you have to go through
> all those hoops is to
>
> * allow the complete set of Python supported encoding names
STINNER Victor added the comment:
Can't we use RegEnumValueW and RegQueryInfoKeyW?
--
___
Python tracker
<http://bugs.python.org/issue9937>
___
___
Pytho
STINNER Victor added the comment:
Le mardi 28 septembre 2010 22:24:56, vous avez écrit :
> I disagree. PyObject_As*Buffer functions are remnants of the old buffer
> API in Python 2.x. They are here only to ease porting of existing C
> code, but carefully written 3.x code should
New submission from STINNER Victor :
PyUnicode_AsWideChar() doesn't merge surrogate pairs on a system with 32 bits
wchar_t and Python compiled in narrow mode (sizeof(wchar_t) == 4 and
sizeof(Py_UNICODE) == 2) => see issue #8670.
It is not easy to fix this problem because the ca
STINNER Victor added the comment:
#9979 proposes to create a new PyUnicode_AsWideCharString() function.
--
___
Python tracker
<http://bugs.python.org/issue8
STINNER Victor added the comment:
New version of the patch:
- fix PyUnicode_AsWideCharString() :-)
- replace PyUnicode_AsWideChar() by PyUnicode_AsWideCharString() in most
functions using PyUnicode_AsWideChar()
- indicate that PyUnicode_AsWideCharString() raises a MemoryError on error
Keep
Changes by STINNER Victor :
Removed file: http://bugs.python.org/file19054/pyunicode_aswidecharstring.patch
___
Python tracker
<http://bugs.python.org/issue9979>
___
___
STINNER Victor added the comment:
See also issue #4626 which introduced PyCF_IGNORE_COOKIE and
PyPARSE_IGNORE_COOKIE flags to support unicode string for the builtin compile()
function.
--
nosy: +haypo
___
Python tracker
<http://bugs.python.
STINNER Victor added the comment:
> But shouldn't PyUnicode_AsWideCharString() merge surrogate pairs when it
> can? The implementation doesn't do this.
I don't want to do two different things at the same time. My plan is:
- create PyUnicode_AsWideCharString()
- use PyUni
STINNER Victor added the comment:
I fixed in this issue in multiple commits:
- r85093: create PyUnicode_AsWideCharString()
- r85094: use it in import.c
- r85095: use it for _locale.strcoll()
- r85096: use it for time.strftime()
- r85097: use it in _ctypes module
> So, you agree with
STINNER Victor added the comment:
Forget my previous message, I forgot important points.
> So the only reason why you have to go through
> all those hoops is to
>
> * allow the complete set of Python supported encoding
> names for the PYTHONFSENCODING
>
>
STINNER Victor added the comment:
Patch version 4:
- Rename "reencode" to "redecode"
- Return -1 (instead of 1) on error
--
title: Reencode filenames when setting the filesystem encoding -> Redecode
filenames when setting the filesystem encoding
Added file:
Changes by STINNER Victor :
Removed file: http://bugs.python.org/file18996/reencode_modules_path-3.patch
___
Python tracker
<http://bugs.python.org/issue9630>
___
___
STINNER Victor added the comment:
Le mercredi 29 septembre 2010 13:45:15, vous avez écrit :
> Marc-Andre Lemburg added the comment:
>
> STINNER Victor wrote:
> > STINNER Victor added the comment:
> >
> > Forget my previous message, I forgot important points.
>
STINNER Victor added the comment:
I commited redecode_modules_path-4.patch as r85115 in Python 3.2.
--
resolution: -> fixed
status: open -> closed
___
Python tracker
<http://bugs.python.org/
STINNER Victor added the comment:
r85115 closes #9630: an important patch for #9425, redecode all filenames when
setting the filesystem encoding.
Next tasks (maybe not in this order):
- merge getpath.c
- redecode argv[0] used by PySys_SetArgvEx() to feed sys.path (encode argv[0]
with the
New submission from STINNER Victor :
$ PYTHONFSENCODING=latin-1 ./python Lib/test/test_warnings.py
...
==
FAIL: test_nonascii (__main__.CEnvironmentVariableTests
STINNER Victor added the comment:
If I understood correctly, you don't want the value to be truncated if the
variable grows between the two calls to confstr(). Which behaviour would you
expect? A Python exception?
> but Victor Stinner has expressed concern that a buggy
> conf
STINNER Victor added the comment:
> OK, so who's messing up: subprocess or Py_main()?
Well, this is the real question :-)
locale encoding is used to decode command line arguments (sys.argv), filesystem
encoding is used to decode environment variables and to encode subprocess
argum
New submission from STINNER Victor :
On UNIX/BSD systems, Python decodes arguments with the locale encoding, whereas
subprocess encodes arguments with the fileystem encoding. If both encodings are
differents, we have a problem.
There was already the issue #4388 but it was closed because it
STINNER Victor added the comment:
I don't understand why you would like to implicitly convert bytes to str (which
is one of the worse design choice of Python2). If you don't want to care about
encodings, use bytes is fine. Decode bytes using an arbitrary encoding is the
fast
STINNER Victor added the comment:
> Indeed, the fs encoding isn't initialized until later in
> Py_InitializeEx. Maybe the PYTHONWARNINGS code should be moved
> there instead?
sys.warnopts should be filled early because it is used to initialize the
_warnings module, and the _w
STINNER Victor added the comment:
[cmdline_encoding-2.patch] Patch to use locale encoding to decode and encode
command line arguments. Remarks about the patch:
- failing to get the locale encoding (very unlikely) is a fatal error
- TODO: in initfsencoding(), Py_FileSystemDefaultEncoding
STINNER Victor added the comment:
> Maybe the PYTHONWARNINGS code should be moved there instead?
sys.warnoptions is read by the warnings module (not the _warnings module) when
this module is loaded. The warnings module is loaded by Py_InitializeEx() if
sys.warnoptions list is not empty.
STINNER Victor added the comment:
> The problem with command line arguments is that they don't necessarily
> have just one encoding (just like env vars may well use more than
> one encoding) on Unix platforms.
The issue #8776 proposes the creation of sys.argv.
> When using pa
STINNER Victor added the comment:
Extract of an interesting message (msg111432) of #8775 (issue specific to Mac
OS X):
<< A system where the filesystem encoding doesn't match the locale encoding is
hard to get right. While it would be possible to add sys.cmdlineencoding th
STINNER Victor added the comment:
> A system where the filesystem encoding doesn't match the locale
> encoding is hard to get right.
Mmmh. The problem is maybe that the new PYTHONFSENCODING environment variable
(added by #8622) introduced an horrible inconstency between Pytho
STINNER Victor added the comment:
> Option 2 (the alternative Antoine suggested and I'm considering):
> - "decode" ... to str ...
> - ... objects are "encoded" back to actual bytes before
> they are returned
In this case, you have to be very careful to
STINNER Victor added the comment:
Update the patch for the new PyUnicode_AsWideCharString() function:
- use Py_UNICODE_SIZE and SIZEOF_WCHAR_T in the preprocessor tests
- faster loop: don't use a counter + pointer, but only use pointers (for the
stop condition)
The patch is not finish
Changes by STINNER Victor :
Removed file:
http://bugs.python.org/file17322/pyunicode_aswidechar_surrogates-py3k.patch
___
Python tracker
<http://bugs.python.org/issue8
STINNER Victor added the comment:
Patch version 3:
- fix unicode_aswidechar if Py_UNICODE_SIZE == SIZEOF_WCHAR_T and w == NULL
(return the number of characters, don't write into w!)
- improve unicode_aswidechar() comment
--
Added file: http://bugs.python.org/file
STINNER Victor added the comment:
I don't know how to test "if Py_UNICODE_SIZE == 4 && SIZEOF_WCHAR_T == 2". On
Windows, sizeof(wchar_t) is 2, but it looks like Python is not prepared to have
Py_UNICODE != wchar_t for is Windows implementation.
wchar_t is 32 bits lon
STINNER Victor added the comment:
Patch version 4:
- implement unicode_aswidechar() for 16 bits wchar_t and 32 bits Py_UNICODE
- PyUnicode_AsWideWcharString() returns the number of wide characters
excluding the nul character as does PyUnicode_AsWideChar()
For 16 bits wchar_t and 32 bits
Changes by STINNER Victor :
Removed file: http://bugs.python.org/file19082/aswidechar_nonbmp-2.patch
___
Python tracker
<http://bugs.python.org/issue8670>
___
___
Pytho
Changes by STINNER Victor :
Removed file: http://bugs.python.org/file19083/aswidechar_nonbmp-3.patch
___
Python tracker
<http://bugs.python.org/issue8670>
___
___
Pytho
STINNER Victor added the comment:
Ooops, I lost my patch to fix the initial (ctypes) issue. Here is an updated
patch: ctypes_nonbmp.patch (which needs aswidechar_nonbmp-4.patch).
--
Added file: http://bugs.python.org/file19101/ctypes_nonbmp.patch
Changes by STINNER Victor :
--
title: Allow bytes in some APIs that use string literals internally ->
urllib.parse: Allow bytes in some APIs that use string literals internally
___
Python tracker
<http://bugs.python.org/iss
STINNER Victor added the comment:
r85172 changes PyUnicode_AsWideCharString() (don't count the trailing nul
character in the output size) and add unit tests.
r85173 patches unicode_aswidechar() to supports non-BMP characters for all
known wchar_t/Py_UNICODE size combinaisons (2/2, 2/4
STINNER Victor added the comment:
r85174+r85177: ctypes.c_wchar supports non-BMP characters with 32 bits wchar_t
=> fix this issue
(I commited also an unwanted change on _testcapi to fix r85172 in r85174:
r85175 reverts this change, and r85176 fixes the _testcapi bug ag
STINNER Victor added the comment:
> r85173 patches unicode_aswidechar() to supports non-BMP characters
> for all known wchar_t/Py_UNICODE size combinaisons (2/2, 2/4 and 4/2).
Oh, and 4/4 ;-)
--
___
Python tracker
<http://bugs.p
New submission from STINNER Victor :
In the following example, sys.path[0] should be
'/home/SHARE/SVN/py3k\udcc3\udca9' (my locale and filesystem encodings are
utf-8):
$ cd /home/SHARE/SVN/py3ké
$ echo "import sys; print(sys.path[0])" > x.py
$ ./python x.p
STINNER Victor added the comment:
See also #10014: sys.path[0] is decoded from the locale encoding instead of the
fileystem encoding.
--
___
Python tracker
<http://bugs.python.org/issue9
STINNER Victor added the comment:
> Since this was a bugfix, it should be merged back into 2.7, yes?
Mmmh, the fix requires to change PyUnicode_AsWideChar() function (support
non-BMP characters and surrogate pairs) (and maybe also to create
PyUnicode_AsWideCharString()). I don't rea
STINNER Victor added the comment:
> If you were worried about performance, then surrogateescape is certainly
> much slower than latin1.
If you were really worried about performance, the bytes type is maybe faster
than: decode bytes to str using latin-1, process str strings, encode
STINNER Victor added the comment:
See also issue #10039.
--
___
Python tracker
<http://bugs.python.org/issue10014>
___
___
Python-bugs-list mailing list
Unsub
STINNER Victor added the comment:
> The problem is that PySys_SetArgvEx() ...
Not only PySys_SetArgvEx(). There is another issue with RunMainFromImporter()
which do: sys.path[0] = filename
--
___
Python tracker
<http://bugs.python.org/issu
New submission from STINNER Victor :
If a program name contains a non-ascii character in its name and/or full path
and PYTHONFSENCODING is set to an encoding different than the locale encoding,
Python fails to open the program.
Example in the utf-8 locale:
$ PYTHONFSENCODING=ascii ./python
STINNER Victor added the comment:
This issue depends on issue #10039.
--
dependencies: +python é.py fails with UnicodeEncodeError if PYTHONFSENCODING is
used
___
Python tracker
<http://bugs.python.org/issue10
STINNER Victor added the comment:
r85302: _wrealpath() and _Py_wreadlink() support surrogates in the input path.
--
realpath_fs_encoding.patch: patch _wrealpath() to encode the resulting path
with the filesystem encoding (with surrogateescape) instead of the locale
encoding. This patch is
STINNER Victor added the comment:
I just created Python/fileutils.c: update the patch for this new file.
--
Added file: http://bugs.python.org/file19153/realpath_fs_encoding-2.patch
___
Python tracker
<http://bugs.python.org/issue10
Changes by STINNER Victor :
Removed file: http://bugs.python.org/file19147/realpath_fs_encoding.patch
___
Python tracker
<http://bugs.python.org/issue10014>
___
___
Pytho
STINNER Victor added the comment:
There was a bug in copy_absolute(): if _Py_wgetcwd() failed, the result was
undefined (depending of the content of "path" buffer). Especially, absolutize()
calls copy_absolute() with a buffer allocated on the stack: the content of this
buffer depe
STINNER Victor added the comment:
deleted_cwd.patch, patch based on labrat's patch updated to py3k:
http://www.physics.drexel.edu/~wking/code/hg/hgwebdir.cgi/python/rev/77f3ad10ba45
Procedure to test the patch:
- go into Python source tree
- make a directory "z"
- en
Changes by STINNER Victor :
--
title: Add the null context manager to contextlib -> Add a "no-op" (null)
context manager to contextlib
___
Python tracker
<http://bugs.pytho
STINNER Victor added the comment:
About your patch:
- __enter__() might return self instead of None... i don't really know which
choice is better. "with Null() as x:" works in both cases
- __exit__() has no result value, "pass" is enough
- I don't li
STINNER Victor added the comment:
> FWIW, this still happens on the latest of /branches/py3k,
> when LANG does not match up to the enforced fs encoding
ixokai has the bug on Snow Leopard x86.
--
___
Python tracker
<http://bugs.p
STINNER Victor added the comment:
py3k_also_no_unicode_error_on_direct_test_run.patch comes a little bit too late:
$ LANG= ./python Lib/test/regrtest.py -v test_time
== CPython 3.2a2+ (py3k, Oct 8 2010, 01:40:20) [GCC 4.4.5 20100909 (prerelease)]
== Linux-2.6.32-trunk-686-i686-with-debian
STINNER Victor added the comment:
> For the record, this can be now reproduced under Linux by forcing different
> locale and filesystem encodings:
>
> $ PYTHONFSENCODING=utf8 LANG=ISO-8859-1 ./python -m test.regrtest
> test_cmd_line
I opened a separated issue for Linux, #999
STINNER Victor added the comment:
> Perhaps. We could also declare that command line arguments and
> environment variables are always UTF-8-encoded on OSX (which I think
> would be fairly accurate)
Python uses the filesystem encoding to encode/decode environment variables,
an
STINNER Victor added the comment:
> So perhaps it would be best if Python had two external default encodings:
> the IO one (command line arguments, environment variables, text files),
> and the file name encoding (defaulting to the IO encoding if not set)
Hum, I prefer to consid
STINNER Victor added the comment:
> We run into problems because we have two inconsistent
> encodings, ...
What? No. We have problems because we don't use the same encoding to decode and
to encode the same data type. It's not a problem to use a different encoding
for each d
STINNER Victor added the comment:
> > What? No. We have problems because we don't use the same encoding to
> > decode and to encode the same data type. It's not a problem to use a
> > different encoding for each data type (stdout, filenames, environment
> > var
STINNER Victor added the comment:
> > ... So Antoine and Martin: which encoding do you prefer?
>
> I still propose to drop the fsname encoding. Then this question goes away.
You mean that we should use the following encoding for the command line
arguments, environment varia
STINNER Victor added the comment:
MvL> > - Windows: unicode for command line/env, mbcs to decode filenames
MvL> No: unicode for filenames also.
Yes, I mean unicode for everything, but decode bytes data from the mbcs
encoding.
--
_
STINNER Victor added the comment:
MAL> If you remove the PYTHONFSENCODING, then we have to reconsider
MAL> removal of sys.setfilesystemencoding().
Plase, Marc, read my comments. You never consider technical problems,
you just propose to ensure that "Python just work
STINNER Victor added the comment:
MAL> You can't just tell people to go with whatever encoding setup
MAL> you prefer to make Python's guessing easier or more correct.
Python doesn't really *guess* the encoding, it just reads the encoding from the
locale.
What do you
STINNER Victor added the comment:
> I guess LANG and LC_CTYPE can be used for other purposes
> such as internationalization.
That's why there are different environement variables:
* LC_MESSAGES for i18n (messages)
* LC_CTYPE for the encoding
* LC_TIME for time and
STINNER Victor added the comment:
issue9992.patch:
- Remove PYTHONFSENCODING environment variable
- Mac OS X: Use utf-8 to decode command line arguments
- Fix issue #9992 (this issue): attached test, locale_fs_encoding.py, pass
- Fix issue #9988
- Fix issue #10014
- Fix issue #10039
STINNER Victor added the comment:
I think that issue9992.patch fixes also #4388 because it uses the same encoding
(FS encoding, utf8) on OSX to encode and to decode command line arguments.
--
___
Python tracker
<http://bugs.python.org/issue9
STINNER Victor added the comment:
> Oops, sorry. I'll withdraw my last patch.
Why? Your patch is useful to run a single test outside regrtest. But you should
not remove the hack on regrtest.py, only keep your patch on unittest/runner.py.
There are not e
New submission from STINNER Victor :
If the site module fails, the error is not logged because of a bug in
initsite(). The problem is that PyFile_WriteString() does nothing if an error
occurred.
- Edit Lib/site.py to add "raise Exception('xxx')" at the beginning of ma
STINNER Victor added the comment:
Fixed in 3.2 (r85386+r85387+r85389), 2.7 (r85390), 3.1 (r85391).
Thanks labrat for your patch. I added you to Misc/ACKS.
--
resolution: -> fixed
status: open -> closed
___
Python tracker
<http://bugs.p
STINNER Victor added the comment:
New version of the patch:
- use more standard function names (_Py_initsegfault => _Py_InitSegfault)
- use "#ifdef HAVE_SIGACTION" to support system without sigaction(): fallback
to signal()
- usage of the alternative stack is now opt
STINNER Victor added the comment:
Updated example:
--
$ ./python Lib/test/crashers/recursive_call.py
Fatal Python error: segmentation fault
Traceback (most recent call first):
File "Lib/test/crashers/recursive_call.py", line 12 in
File
Changes by STINNER Victor :
--
nosy: +dmalcolm
___
Python tracker
<http://bugs.python.org/issue8863>
___
___
Python-bugs-list mailing list
Unsubscribe:
2301 - 2400 of 35284 matches
Mail list logo