>> The Python UTF-8 codec will happily encode half-surrogates; people argue
>> that it is a bug that it does so, however, it would help in this
>> specific case.
> 
> Can we use this encoding scheme for writing into files as well?  We've
> turned the filename with undecodable bytes into a string with half
> surrogates.  Putting that string into a file has to turn them into bytes
> at some level.  Can we use the python-escape error handler to achieve
> that somehow?

Sure: if you are aware that what you write to the stream is actually
a file name, you should encode it with the file system encoding, and
the python-escape handler. However, it's questionable that the same
approach is right for the rest of the data that goes into the file.

If you use a different encoding on the stream, yet still use the
python-escape handler, you may end up with completely non-sensical
bytes. In practice, it probably won't be that bad - python-escape
has likely escaped all non-ASCII bytes, so that on re-encoding with
a different encoding, only the ASCII characters get encoded, which
likely will work fine.

Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to