On Tue, Sep 28, 2010 at 9:29 PM, Michael Foord <fuzzy...@voidspace.org.uk> wrote: > On 28/09/2010 12:19, Antoine Pitrou wrote: >> On Mon, 27 Sep 2010 23:45:45 -0400 >> Steve Holden<st...@holdenweb.com> wrote: >>> On 9/27/2010 11:27 PM, Benjamin Peterson wrote: >>>> Tokenize only works on bytes. You can open a feature request if you >>>> desire. >>>> >>> Working only on bytes does seem rather perverse. >> >> I agree, the morality of bytes objects could have been better :) >> > The reason for working with bytes is that source data can only be correctly > decoded to text once the encoding is known. The encoding is determined by > reading the encoding cookie. > > I certainly wouldn't be opposed to an API that accepts a string as well > though.
A very quick scan of _tokenize suggests it is designed to support detect_encoding returning None to indicate the line iterator will return already decoded lines. This is confirmed by the fact the standard library uses it that way (via generate_tokens). An API that accepts a string, wraps a StringIO around it, then calls _tokenise with an encoding of None would appear to be the answer here. A feature request on the tracker is the best way to make that happen. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com