Re: [Python-Dev] issue2180 and using 'tokenize' with Python 3 'str's

Nick Coghlan Tue, 28 Sep 2010 05:14:26 -0700

On Tue, Sep 28, 2010 at 9:29 PM, Michael Foord
<fuzzy...@voidspace.org.uk> wrote:
>  On 28/09/2010 12:19, Antoine Pitrou wrote:
>> On Mon, 27 Sep 2010 23:45:45 -0400
>> Steve Holden<st...@holdenweb.com>  wrote:
>>> On 9/27/2010 11:27 PM, Benjamin Peterson wrote:
>>>> Tokenize only works on bytes. You can open a feature request if you
>>>> desire.
>>>>
>>> Working only on bytes does seem rather perverse.
>>
>> I agree, the morality of bytes objects could have been better :)
>>
> The reason for working with bytes is that source data can only be correctly
> decoded to text once the encoding is known. The encoding is determined by
> reading the encoding cookie.
>
> I certainly wouldn't be opposed to an API that accepts a string as well
> though.


A very quick scan of _tokenize suggests it is designed to support
detect_encoding returning None to indicate the line iterator will
return already decoded lines. This is confirmed by the fact the
standard library uses it that way (via generate_tokens).

An API that accepts a string, wraps a StringIO around it, then calls
_tokenise with an encoding of None would appear to be the answer here.
A feature request on the tracker is the best way to make that happen.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] issue2180 and using 'tokenize' with Python 3 'str's

Reply via email to