Re: [Python-Dev] issue2180 and using 'tokenize' with Python 3 'str's

Michael Foord Tue, 28 Sep 2010 17:28:09 -0700

On 29 Sep 2010, at 00:22, "Martin v. Löwis" <mar...@v.loewis.de> wrote:


>> I certainly wouldn't be opposed to an API that accepts a string as well
>> though.
> 
> Notice that this can't really work for Python 2 source code (but of
> course, it doesn't need to).
> 
> In Python 2, if you have a string literal in the source code, you need
> to know the source encoding in order to get the bytes *back*. Now,
> if you parse a Unicode string as source code, and it contains byte
> string literals, you wouldn't know what encoding to apply.
> 
> Fortunately, Python 3 byte literals ban non-ASCII literal characters,
> so assuming an ASCII-compatible encoding for the original source is
> fairly safe.
> 

The new API couldn't be ported to Python 2 •anyway•. As Nick pointed out, the 
underlying tokenization happens on decoded strings - so starting with source as 
Unicode will be fine. 

Michael 




> Regards,
> Martin
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] issue2180 and using 'tokenize' with Python 3 'str's

Reply via email to