Re: [Python-Dev] Unicode byte order mark decoding

2005-04-05 Thread Martin v. Löwis
Stephen J. Turnbull wrote: Of course it must be supported. My point is that many strings (in my applications, all but those strings that result from slurping in a file or process output in one go -- example, not a statistically valid sample!) are not the beginning of "what once was a stream". It

Re: [Python-Dev] inconsistency when swapping obj.__dict__ with a dict-like object...

2005-04-05 Thread Brett C.
Alex A. Naanou wrote: > Hi! > > here is a simple piece of code > > ---cut--- > class Dict(dict): > def __init__(self, dct={}): > self._dict = dct > def __getitem__(self, name): > return self._dct[name] > def __setitem__(self, name, value): > self._dct[name] = v

[Python-Dev] inconsistency when swapping obj.__dict__ with a dict-like object...

2005-04-05 Thread Alex A. Naanou
Hi! here is a simple piece of code ---cut--- class Dict(dict): def __init__(self, dct={}): self._dict = dct def __getitem__(self, name): return self._dct[name] def __setitem__(self, name, value): self._dct[name] = value def __delitem__(self, name):

Re: [Python-Dev] Developer list update

2005-04-05 Thread Tim Peters
[Fred Drake] >> Would anyone here object to renaming the file to developers.txt, though? [Barry Warsaw] > +1, please! I voted with my DOS box. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscri

Re: [Python-Dev] Unicode byte order mark decoding

2005-04-05 Thread Stephen J. Turnbull
> "Martin" == Martin v Löwis <[EMAIL PROTECTED]> writes: Martin> So people do use the "decode-it-all" mode, where no Martin> sequential access is necessary - yet the beginning of the Martin> string is still the beginning of what once was a Martin> stream. This case must be supp

Re: [Python-Dev] Developer list update

2005-04-05 Thread Barry Warsaw
On Tue, 2005-04-05 at 19:06, Fred Drake wrote: > Would anyone here object to renaming the file to developers.txt, though? +1, please! -Barry signature.asc Description: This is a digitally signed message part ___ Python-Dev mailing list Python-Dev@pyt

Re: [Python-Dev] Developer list update

2005-04-05 Thread Fred Drake
On Tuesday 05 April 2005 06:47, Raymond Hettinger wrote: > Also, to help with institutional memory, I started a log of changes to > developer permissions. The goal is to remember who was given access, by > whom, and why (some folks are given access for a one-shot project for > example). The f

[Python-Dev] Developer list update

2005-04-05 Thread Raymond Hettinger
FYI, I'm starting a project to see what has become of some of the inactive developers. Essentially, it involves sending them a note to see if they still have use for their checkin permissions. If not, then we can make the change and improve security a bit. Also, to help with institutional memory

Re: [Python-Dev] Unicode byte order mark decoding

2005-04-05 Thread "Martin v. Löwis"
Walter Dörwald wrote: There are situations where the byte stream might be temporarily exhausted, e.g. an XML parser that tries to support the IncrementalParser interface, or when you want to decode encoded data piecewise, because you want to give a progress report. Yes, but these are not file-like

Re: [Python-Dev] Unicode byte order mark decoding

2005-04-05 Thread Walter Dörwald
Evan Jones sagte: > On Apr 5, 2005, at 15:33, Walter Dörwald wrote: >> The stateful decoder has a little problem: At least three bytes >> have to be available from the stream until the StreamReader >> decides whether these bytes are a BOM that has to be skipped. >> This means that if the file only

Re: [Python-Dev] Unicode byte order mark decoding

2005-04-05 Thread Walter Dörwald
Martin v. Löwis sagte: > Walter Dörwald wrote: >> The stateful decoder has a little problem: At least three bytes >> have to be available from the stream until the StreamReader >> decides whether these bytes are a BOM that has to be skipped. >> This means that if the file only contains "ab", the us

Re: [Python-Dev] longobject.c & ob_size

2005-04-05 Thread Tim Peters
[Michael Hudson] > Asking mostly for curiousity, how hard would it be to have longs store > their sign bit somewhere less aggravating? Depends on where that is. > It seems to me that the top bit of ob_digit[0] is always 0, for example, Yes, the top bit of ob_digit[i], for all relevant i, is 0 on

Re: [Python-Dev] Unicode byte order mark decoding

2005-04-05 Thread Martin v. Löwis
Walter Dörwald wrote: The stateful decoder has a little problem: At least three bytes have to be available from the stream until the StreamReader decides whether these bytes are a BOM that has to be skipped. This means that if the file only contains "ab", the user will never see these two character

Re: [Python-Dev] Unicode byte order mark decoding

2005-04-05 Thread Fred Drake
On Tuesday 05 April 2005 15:53, Evan Jones wrote: > This functionality is provided by a flush() method on similar objects, > such as the zlib compression objects. Or by close() on other objects (htmllib, HTMLParser, the SAX incremental parser, etc.). Too bad there's more than one way to do it.

Re: [Python-Dev] Unicode byte order mark decoding

2005-04-05 Thread Evan Jones
On Apr 5, 2005, at 15:33, Walter Dörwald wrote: The stateful decoder has a little problem: At least three bytes have to be available from the stream until the StreamReader decides whether these bytes are a BOM that has to be skipped. This means that if the file only contains "ab", the user will nev

Re: [Python-Dev] Unicode byte order mark decoding

2005-04-05 Thread Walter Dörwald
Walter Dörwald sagte: > M.-A. Lemburg wrote: > >>> [...] >>>With the UTF-8-SIG codec, it would apply to all operation >>> modes of the codec, whether stream-based or from strings. Whether >>>or not to use the codec would be the application's choice. >> >> I'd suggest to use the same mode of operat

Re: [Python-Dev] Unicode byte order mark decoding

2005-04-05 Thread Martin v. Löwis
Stephen J. Turnbull wrote: Martin> With the UTF-8-SIG codec, it would apply to all operation Martin> modes of the codec, whether stream-based or from strings. I had in mind the ability to treat a string as a stream. Hmm. A string is not a stream, but it could be the contents of a stream. A

Re: [Python-Dev] Mail.python.org

2005-04-05 Thread Skip Montanaro
Grant> Not a big deal, but I noticed that https://mail.python.org/ is Grant> live and shows a generic "Welcome to your new home in Grant> cyberspace!" message. One of the webmasters may want to Grant> automatically redirect to http://mail.python.org. Thanks, I forwarded this alon

Re: [Python-Dev] Unicode byte order mark decoding

2005-04-05 Thread Stephen J. Turnbull
>>"MAL" == M <[EMAIL PROTECTED]> writes: MAL> Stephen J. Turnbull wrote: >> The Japanese "memopado" (Notepad) uses UTF-8 signatures; it >> even adds them to existing UTF-8 files lacking them. MAL> Is that a MS application ? AFAIK, notepad, wordpad and MS MAL> Office alwa

Re: [Python-Dev] Unicode byte order mark decoding

2005-04-05 Thread Stephen J. Turnbull
> "Martin" == Martin v Löwis <[EMAIL PROTECTED]> writes: Martin> Stephen J. Turnbull wrote: >> However, this option should be part of the initialization of an >> IO stream which produces Unicodes, _not_ an operation on >> arbitrary internal strings (whether raw or Unicode).

Re: [Python-Dev] Unicode byte order mark decoding

2005-04-05 Thread M.-A. Lemburg
Stephen J. Turnbull wrote: >>"MAL" == M <[EMAIL PROTECTED]> writes: > > > MAL> The BOM (byte order mark) was a non-standard Microsoft > MAL> invention to detect Unicode text data as such (MS always uses > MAL> UTF-16-LE for Unicode text files). > > The Japanese "memopado" (Notep

Re: [Python-Dev] Unicode byte order mark decoding

2005-04-05 Thread Walter Dörwald
M.-A. Lemburg wrote: [...] With the UTF-8-SIG codec, it would apply to all operation modes of the codec, whether stream-based or from strings. Whether or not to use the codec would be the application's choice. I'd suggest to use the same mode of operation as we have in the UTF-16 codec: it removes

Re: [Python-Dev] Unicode byte order mark decoding

2005-04-05 Thread M.-A. Lemburg
Martin v. Löwis wrote: > Stephen J. Turnbull wrote: > >> So there is a standard for the UTF-8 signature, and I know of >> applications which produce it. While I agree with you that Python's >> codecs shouldn't produce it (by default), providing an option to strip >> is a good idea. > > I would p

Re: [Python-Dev] Unicode byte order mark decoding

2005-04-05 Thread "Martin v. Löwis"
Stephen J. Turnbull wrote: So there is a standard for the UTF-8 signature, and I know of applications which produce it. While I agree with you that Python's codecs shouldn't produce it (by default), providing an option to strip is a good idea. I would personally like to see an "utf-8-bom" codec (p