On Thu, Jul 7, 2011 at 8:53 PM, Vinay Sajip <vinay_sa...@yahoo.co.uk> wrote: > I've no issue with telling people to use open() rather than codecs.open() when > moving code from 2.x to 3.x. But in 2.x, is there any other API which allows > you > to wrap arbitrary streams? If not, then ISTM that removing the Stream* classes > would give 2.x->3.x porting projects more trouble than codecs.open() -> > open().
No, using the io module is a far more robust way to wrap arbitrary streams than using the codecs module. It's unfortunate that nobody pointed out the redundancy when PEP 3116 was discussed and implemented, as I expect PEP 100 would have been updated and the Stream* APIs would have been either reused or officially jettisoned as part of the Py3k migration. However, we're now in a situation where we have: 1. A robust Unicode capable IO implementation (the io module, based on PEP 3116) that is available in both 2.x and 3.x that is designed to minimise the amount of work involved in writing new codecs 2. A legacy IO implementation (the codecs module) that is available in both 2.x and 3.x, but requires additional work on the part of codec authors and isn't as robust as the PEP 3116 implementation So the options are: A. Bring the codecs module IO implementation up to the standard of the io module implementation (less the C acceleration) and maintain the requirement that codec authors provide StreamReader and StreamWriter implementations. B. Retain the full codecs module API, but reimplement relevant parts in terms of the io module. C. Deprecate the codecs.Stream* interfaces and make codecs.open() a simple wrapper around the builtin open() function. Formally drop the requirement that codec authors provide StreamReader/Writer instances (since they are not used by the core IO implementation) Currently, nobody has stepped forward to do the work of maintaining the codecs IO implementation independently of the io module, so the only two options seriously on the table are B and C. That may change if someone actually goes through and *fixes* all the error cases that are handled correctly by the io module but not by the codecs module and credibly promises to continue to do so for at least the life of 3.3. A 2to3 fixer that simply changes "codecs.open" to "open" is not viable, as the function signatures are not compatible (the buffering parameter appears in a different location): codecs.open(): open(filename, mode='rb', encoding=None, errors='strict', buffering=1) 3.x builtin open(): open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True) Now, the backported io module does make it possible to write correct code as far back as 2.6 that can be forward ported cleanly to 3.x without requiring code modifications. However, it would be nice to transparently upgrade code that uses codecs.open to the core IO implementation in 3.x. For people new to Python, the parallel (and currently deficient) alternative IO implementation also qualifies at the very least as an attractive nuisance. Now, it may be that this PEP runs afoul of Guido's stated preference not to introduce any more backwards incompatibilities between 2.x and 3.x that aren't absolutely essential. In that case, it may be reasonable to add an option D to the mix, where we just add documentation notes telling people not to use the affected codecs module APIs and officially declare that bug reports on those APIs will be handled with "don't use these, use the io module instead", as that would also deal with the maintenance problem. It's pretty ugly from an end user's point of view, though. Regards, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com