k you'll have to write an alternative PEP if you want to see
something like this implemented throughout Python.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
would work no matter whether that expectation agrees
with reality or not. The amount of moji-bake that you get is larger
when the disagreement is larger, but it will continue to *work*.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@p
se, can you please describe a
specific scenario? What application, what file names, what encodings,
what problems?
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://
which I agree with, but which clearly the Unix people
> here are not as certain of.
Ok, I have added another paragraph. Not sure whether it helps to clarify
though.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.pytho
e an example for a setup where it is not reversible.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
e to something, and then create files in their
homedirectory, it will work just fine, and nobody else will ever see
the files (except for the backup software).
When they find that the files they created are inaccessible to others,
they will often stop
ged the PEP to avoid using PUA characters at all.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
s)
the problem is more severe, and people are more likely to create
files with Cyrillic, or Japanese, names (say) if the systems accepts
them at all.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
es
> (encoding would produce the original bytes).
>
> I'd prefer option 2.
I hadn't thought of this case, but you are right - they *are*
illegal bytes, after all. Raising an exception would be useless
since the whole point of this codec is to never raise unicode
errors.
Regards,
M
renamed because the target
name already exists.
In all these cases, the application has to ask the user to
reconsider; for at least the last case, it should be prepared
to do that, anyway (there is also the case where renaming fails
because of lack of permissions; in that case, picking a different
> Does anyone know what the problem is?
The hardware running it apparently has serious problems.
Upfronthosting, the company providing the hardware, is
working on a solution. Unfortunately, it is difficult to
get support from the datacenter on weekends.
Regards,
Mar
> How about another str-like type, a sequence of char-or-bytes?
That would be a different PEP. I personally like my own proposal
more, but feel free to propose something different.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
h
ter within the private use area, or a particular
> range, or what?
It's a range. The lower-case 'x' denotes a variable half-byte, ranging
from 0 to F. So this is the range U+F0100..U+F01FF, giving 256 code
points.
Regards,
Martin
___
u pass?
> And for a Unix filesystem mounted on a Windows host? Or accessed via
> some network connection?
Same issue really: what specific mounting software did you use? Windows
cannot mount Unix file systems on its own, or through some network
connection.
Regards,
Martin
therwise, if you can contribute a useful bug report (or even a patch),
please go ahead. I would try to turn logging on through the registry and
see whether that gives any insight.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mai
filename is not valid in my locale's encoding). The
> Nedit editor also worked. So far I haven't found anything that failed.
So what SMB server did you mount here, using what software, and what
mount options?
I think you might be referring to an entirely different use case.
Regard
Glenn Linderman wrote:
> On approximately 4/27/2009 12:42 PM, came the following characters from
> the keyboard of Martin v. Löwis:
>>>> It's a private use area. It will never carry an official character
>>>> assignment.
>>>
>>> I know that U+F
other solution. It was in
> that sense, of thinking about possibly existing practice, and leveraging
> an existing solution, that caused me to bring up the topic.
I think you make quite a lot of assumptions here. It would be better
to research the state of the art first, and onl
hon automatically encodes strings with the file system encoding
before passing them to the POSIX API.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
/.
I'm happy to exclude that range from the mapping if POSIX really
requires an encoding not to be overlapping with ASCII.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Uns
n, at the moment.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
they don't support command
line arguments, or environment variables. If you want to complete them,
you should write a PEP.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
he same thing will
happen: the name of the file will be the very same byte sequence as the
one passed on the command line. Most Unix users here agree that this is
the right thing to happen.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@pyt
;.
>
> Since this byte sequence doesn't represent a valid character when
> decoded with UTF-8, it should simply be considered an invalid UTF-8
> sequence of three bytes and decoded to '\udced\udcb3\udcbf' (*not*
> '\udcff').
>
> Mart
upporting a LRU list, I would remove/hide all entries that don't
correlate to existing files - after all, the user may have as well
deleted the file in the LRU list.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail
> If the PEP depends on this being changed, it should be mentioned in the
> PEP.
The PEP says that the utf-8b codec decodes invalid bytes into low
surrogates. I have now clarified that a strict definition of UTF-8
is assumed for utf-8b.
Regards,
> Furthermore, I don't believe that PEP 383 works consistently on Windows,
What makes you say that? PEP 383 will have no effect on Windows,
compared to the status quo, whatsoever.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.
most libraries is
> undefined for the kinds of unicode strings you construct, and it may be
> undefined in a bad way (crash, buffer overflow, whatever).
Indeed so. This is intentional. If you can crash Python that way,
nothing gets worse by this PEP - you can then *already* crash Python
i
MRAB wrote:
> Martin v. Löwis wrote:
>>> Furthermore, I don't believe that PEP 383 works consistently on Windows,
>>
>> What makes you say that? PEP 383 will have no effect on Windows,
>> compared to the status quo, whatsoever.
>>
> You could argue that
sk are also different in memory. No ambiguity.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
, it is.
Rest assured that the utf-8b codec will work the way it is specified.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/option
Glenn Linderman wrote:
> On approximately 4/28/2009 1:25 PM, came the following characters from
> the keyboard of Martin v. Löwis:
>>> The UTF-8b representation suffers from the same potential ambiguities as
>>> the PUA characters...
>>
>> Not at all the sa
I personally don't see a problem here - *of course* os.listdir will
report invalid utf-16 encodings, if that's what is stored on disk.
It doesn't matter whether the file names are valid wrt. some
specification. What matters is that you can access all the files.
Regards,
Martin
and the string interface to access the same file:
this would be a ridiculous interpretation. *Of course* you can
access /etc/passwd both as "/etc/passwd" and b"/etc/passwd",
there is nothing ambiguous about that.
Regards,
Martin
ith completely non-sensical
bytes. In practice, it probably won't be that bad - python-escape
has likely escaped all non-ASCII bytes, so that on re-encoding with
a different encoding, only the ASCII characters get encoded, which
likely will work fine.
Regards,
Martin
___
T] a b c
>
> which would produce "ABC" in unicode, which is ambiguous with:
>
> A B C
>
> which would also produce "ABC"?
No: the "shift" in "shift-jis" is not really about the shift key.
See http://en.wikipedia.org/wiki/Shift-JIS
Regar
> I would like utility functions to perform:
> os-bytes->funny-encoded
> funny-encoded->os-bytes
> or explicit example code snippets for same in the PEP text.
Done!
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http
> OK, so you are saying that under PEP 383, utf-8b wouldn't be used
> anywhere on Windows by default. That's not clear from your proposal.
You didn't read it carefully enough. The first three paragraphs of
the "Specification" section ma
both get used.
>>
>> Your formulation is a bit too stenographic to me, but please trust me
>> that there is *no* ambiguity in the case you construct.
>
>
> No Martin, the point of reviewing the PEP is to _not_ trust you, even
> though you are generally very knowledgeab
ready produces errors, that
> UTF-8 is a special case, not the only case.
I have fixed that by extending the third paragraph.
> The code added to the discussion has mismatched (), making me wonder if
> it is complete. There is a reasonable possibility that only the final )
> is missing
es on being able to code half surrogates as UTF-8?
Can you please elaborate? What code specifically are you talking about?
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
ht
are a little clearer in the documentation for
> sys.setfilesystemencoding, which does say the encoding isn't used by
> Windows -- so why is it permitted to change it, if it has no effect?).
As in many cases: because nobody contributed code to make it behave
otherwise. It's not that
n the
LRU list, and lookup the file by inode number - or object UUID
on NTFS, possibly using distributed link tracking).
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http
) and string.atoi() allow whitespace. Maybe I'm
> thinking of trailing non-numeric, non-whitespace characters.
Maybe you remember truly *embedded* whitespace:
py> float("1. 3")
Traceback (most recent call last):
File "", line 1, in
ValueError: in
ames to bytes. I have clarified the PEP to make that
explicit. IOW, it replaces the current "strict" setting in these cases.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubsc
> But I shouldn't have to guess. The PEP should explain how these things
> are useful. The discussion section could be extended with use cases for
> both the encode and decode cases.
See PEP 293.
Regards,
Martin
___
Python-Dev maili
d that filenames come in two forms
> unicode and bytes if its not utf-8 data. Why not simply return string if
> its valid utf-8 otherwise return bytes?
That would have been an alternative solution, and the one that 2.x uses
for listdir. People di
is what happens when you try to work
> with them. Please, let's see some code we can run, not more words.
Just try my example above, on a Linux system, in a UTF-8 locale.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
t on the other VMs, those would have
to either implement it natively, or provide byte-oriented APIs to allow
Jython/IronPython to implement it on top of it (the latter being not
realistic or useful).
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@pyth
ferent
things, they have to tell me what specifically they want it to say
(perhaps even with specific formulations). If they can't communicate
their requests to me, I can't comply.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev
discussions could be reduced if readers would try to
constructively comment on the PEP, rather than making counter-proposals,
or making statements about the PEP without making their implied
assumptions explicit.
Regards,
Martin
___
Python-Dev ma
Jeroen Ruigrok van der Werven wrote:
> -On [20090430 07:18], "Martin v. Löwis" (mar...@v.loewis.de) wrote:
>> Suppose I create a new directory, and run the following script
>> in 3.x:
>>
>> py> open("x","w").close()
>> py> o
> incompatible with CPython running on UNIX, and there's no way to fix that.
*Not* adapting the PEP will also make CPython and IronPython
incompatible, and there's no way to fix that.
Regards,
Martin
___
Python-Dev mailing list
Python-De
code support (via UTF-8) before Windows.
If so, PEP 383 won't hurt. If you never get decode errors for file
names, you can just ignore PEP 383. It's only for those of us who do
get decode errors.
Regards,
Martin
___
Python-Dev mailing list
Pyt
l files on disk.
Neither Java nor Mono are capable of doing that.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
) behaves the same way.
PyQt displays a single square box.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail
> Assuming people agree that this is an accurate summary, it should be
> incorporated into the PEP.
Done!
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
> I think it has to be excluded from mapping in order to not introduce
> security issues.
I think you are right. I have now excluded ASCII bytes from being
mapped, effectively not supporting any encodings that are not ASCII
compatible. Does that sound ok?
Regards,
> OK, so what's wrong with os.listdir() and similar functions returning a
> unicode string for strings that correctly encode/decode, and with byte
> strings for strings that are not valid unicode?
See http://bugs.python.org/issue3187
in particular msg71655
R
't be rushed through, and
> one should look more carefully first at what the Windows kernel does in
> these situations, and what Mono and Java do.
These questions really have been studied on this list for the last eight
years, over and over again. It's not being rushed.
Regards,
Marti
with the string-oriented ones. In the rationale, the PEP
explains why I consider this the worse choice.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mai
UTF-8.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
PEP 382. I have
now changed the PEP to call the files .pth, more in line
with how top-level .pth files work, and added a statement
that the import feature of .pth files is not provided for
package .pth files (use __init__.py instead).
Regards,
Martin
___
Pyt
30553): WARNING **: FindNextFile: Bad encoding for
'/home/martin/work/3k/t/\xff'
Consider using MONO_EXTERNAL_ENCODINGS
when running the program
using System.IO;
class X{
public static void Main(string[] args){
DirectoryInfo di = new DirectoryInfo(".");
foreach(
it to be.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
re analogous
to the one I got when using System.IO.DirectoryInfo ever exist in
Python?
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
might have to bind other arguments with utf-8 conversion.
I couldn't find a Python wrapper for libtiff. If a wrapper was written,
it would indeed have to use the file system encoding for the file name
parameters. However, it would have to do that even without PEP 383,
since the file name should
> Martin, if you're going to stick with the half-surrogate trick, would
> you mind adding a section to the PEP on "alternate encoding strategies",
> explaining why the NULL method was not selected?
In the PEP process, it isn't my job to criticize competing prop
), listdir() et. al. will continue to accept bytes
> for filenames?
In Python 3, the file chooser should definitely return strings, and it
would be good if they were PEP 383 compliant.
>> So I prefer the half surrogate because its failure mode is better th
>
> Heh heh heh.
And i
that strings decoded
> from the filesystem are reversible, but not check what might be de novo
> strings?
Exactly so.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
h
> myself?
No, you should encode using the "strict" error handler, with the
locale encoding. If the file name encodes successfully, it's correct,
otherwise, it's broken.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python
> I've taken the liberty of explicitly CCing Martin just incase he missed
> the thread with all the noise regarding PEP383.
>
> If there are no objections from Martin
It's fine with me - I just won't have time to look into the details of
th
.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
ng, 'python-escape').decode('utf-8',
> 'python-escape') will always produce srcbytes ?
I think you mixed up bytes and unicode here: if srcbytes is indeed
a bytes object, then you can't apply .encode to it.
Regards,
Martin
___
declarative - no need to write code.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
nit__.py
2. addon1.tar, containing
simplistix/addon1.pth (containing a single "*")
simplistix/feature1.py
3. addon2.tar, containing
simplistix/addon2.pth
simplistix/feature2.py
Unpack each of them anywhere on sys.path, in any order.
Regards,
Martin
_
one know how to do the latter, not because there is
> no desire to do so!
>
> I, for one, have been trying to figure out how to do "base namespace"
> packages for years...
You mean, without PEP 382?
That won't be possible, unless you can coordinate all addon packages.
Ba
install into the same directory, you can have
base packages already.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/ar
the name "python-escape" is not very
descriptive, so I've changed the name to "utf8b".
I've updated the PEP accordingly.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/l
> That's even nicer. One minor detail though, in the sentence:
>
> "non-decodable bytes >128 will be represented as lone half surrogate"
>
> ">" should be ">=".
Thanks, fixed.
Martin
___
t code points.
> Also, if utf8-b is not provided as a codec, will there be an easy way for user
> code to use the same encoding as the IO layer does?
s.encode(os.getfilesystemencoding(), "utf8b") will do just that (in
fact, that's ex
> it's an algorithm based on 16-bit or 32-bit code points.
>
>
> To me that lack of relationship with utf8 suggests that it should not be
> called utf8b
Perhaps. However, giving it that name was Markus Kuhn's choice - and
while it may be confusing, it's
code points, while the
> output is 16- or 32-bit code points.
Right - the algorithm maps between bytes and 16/32-bit code units.
It works, in particular, for UTF-8, and was originally proposed to apply
to UTF-8 - but it can work in any other place that c
In
> some error handlers, such as the 'utf8b' proposed here, it is also
> simpler and more efficient for the error handler to provide a
> pre-encoded replacement byte string, rather than forcing it to
> calculating Unicode from which the encoder would cr
the vacuum".
In any case, the discussion says
# Encodings that are not compatible with ASCII are not supported by
# this specification; bytes in the ASCII range that fail to decode
# will cause an exception. It is widely agreed that such encodings
# should not be used as locale charsets.
learly separate.
They *are* separate naemspaces; that's guaranteed by the implementation.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman
Stephen J. Turnbull wrote:
> "Martin v. Löwis" writes:
> > > It occurs to me that the PEP maybe should say that it is an error
> > > to have your POSIX locale set to UTF-16 or something like that.
> >
> > No. It is *impossible* to have UTF-
s allow them to do
what they want). Libraries can never enforce that applications conform
to some standard.
> Sorry! I suggest substituting the paragraph above for the paragraph
> which begins "The encode error handler interface presentlyrequires..."
> at
hink
of ways to enhance the experience.
In any case, Python 3.1b1 may get released today, so it's way too late
for new features in the PEP. They can wait for Python 3.2.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.
>>> The name "utf8b" suggested in the PEP is not in line with the codec
>>> design
>> Where is that design documented, and how exactly violates the name
>> the design (chapter and verse, please).
>
> Martin, I designed the whole Python codec machine
#x27;t handle this
Not true. PEP 383 handles this very example just fine, with no problems
that I can see. Can you propose a specific example that you think might
cause problems? By "specific", I mean: what file names (exact bytes,
please), what locale charset, what API calls.
Regards,
> Judging by the existing names, I think that 'surrogate' would be
> reasonable
MAL's list of existing names is incomplete. "surrogates" is already
an existing name, also, and it means something different (similar,
b
Terry Reedy wrote:
> Glenn Linderman wrote:
>> On approximately 5/6/2009 3:08 AM, came the following characters from
>> the keyboard of MRAB:
>>> M.-A. Lemburg wrote:
>>>> Martin v. Löwis wrote:
>>
>>> Judging by the existing names, I think t
> Is it only usable with utf8 as an encoding?
No, it applies to any codec which potentially cannot decode
all bytes >127.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsub
but under the PEP as it is
> a large fraction of Shift JIS and Big5 filenames cannot be read under
> ASCII-compatible file system encodings using 'utf8b'. Yet it is those
> users who are placed at risk by PEP 383.
I think this statement is i
Antoine Pitrou wrote:
> Martin v. Löwis v.loewis.de> writes:
>> Despite there being also an error handler called "surrogates".
>
> People, perhaps we could end all the bikeshedding and call one of those
> handlers
> "surrogates-pass" and the o
e: I call it utf8b because that's
the established name for the algorithm it implements.
That algorithm was originally designed with UTF-8 in mind (and only
meant to be applied for UTF-8), however, it remains the same algorithm
even though PEP 383 widens its applic
st recent call last):
File "", line 1, in
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in
position 0: surrogates not allowed
py> "\ud800".encode("utf-8","surrogates")
b'\xed\xa0\x80'
py> b
Michael Urman wrote:
> On Wed, May 6, 2009 at 15:42, "Martin v. Löwis" wrote:
>> Despite there being also an error handler called "surrogates".
>
> Not that I have to be, but I'm not sold on the previous UTF-8 codec
> behavior becoming an error handler
e("shift-jis","utf8b")
'\udc81/'
so the utf8b error handler will escape the first of the two bytes,
and then pass the second byte to the codec again, which then decodes
as ASCII.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
801 - 900 of 5755 matches
Mail list logo