>> The Python UTF-8 codec will happily encode half-surrogates; people argue >> that it is a bug that it does so, however, it would help in this >> specific case. > > Can we use this encoding scheme for writing into files as well? We've > turned the filename with undecodable bytes into a string with half > surrogates. Putting that string into a file has to turn them into bytes > at some level. Can we use the python-escape error handler to achieve > that somehow?
Sure: if you are aware that what you write to the stream is actually a file name, you should encode it with the file system encoding, and the python-escape handler. However, it's questionable that the same approach is right for the rest of the data that goes into the file. If you use a different encoding on the stream, yet still use the python-escape handler, you may end up with completely non-sensical bytes. In practice, it probably won't be that bad - python-escape has likely escaped all non-ASCII bytes, so that on re-encoding with a different encoding, only the ASCII characters get encoded, which likely will work fine. Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com