On Thu, May 7, 2009 at 00:43, "Martin v. Löwis" <mar...@v.loewis.de> wrote:
> Michael Urman wrote:
>> On Wed, May 6, 2009 at 15:42, "Martin v. Löwis" <mar...@v.loewis.de> wrote:
>>> Despite there being also an error handler called "surrogates".
>>
>> Not that I have to be, but I'm not sold on the previous UTF-8 codec
>> behavior becoming an error handler of the name "surrogates" for two
>> reasons (I do respect the obvious PBP argument for the implementation,
>> and have no better name - "lenient"?).
>
> PBP?

Practicality beats purity. From a purity standpoint, the legacy
invalid utf-8 seems more like an encoding than an error handler to me.
From a practicality standpoint, it's presumably much more convenient
to implement it on top of the new valid UTF-8 codec's behavior. And
then any error handler needs a name.

> Well, there is a way to stack error handlers, although it's not pretty:
> [...]
> codecs.register_error("surrogates_then_replace",
>                      surrogates_then_replace)

That mitigates my arguments significantly, although I'd rather see
something like errors=('surrogates', 'replace') chain the handlers
without additional registrations. But that's a different PEP or
arbitrary change. :)

>> The stacking argument also applies to the new utf8b behavior on encode
>> (only, as it handles all errors on decode). This may be a YAGNI
>
> Indeed - in particular, as, in the primary application of this error
> handler (i.e. file IO operations), there is no way of specifying
> an addition error handler anyway.

Would it be useful to allow setting this somewhere? It'd be analogous
to setfsencoding, perhaps a setfsencodingerrors. It's not hard to
imagine an application working on Windows where all Unicode characters
are valid, and constructing backup filenames by adding some arbitrary
character, or receiving them from a user who doesn't understand
encodings. When this application is taken to a non-Unicode filesystem,
without the ability to say "I really want a valid filename: so
replace", that could get messy. But it may still be a YAGNI, or a
"don't do that."

-- 
Michael Urman
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to