[issue12486] tokenize module should have a unicode API

2018-06-05 Thread Thomas Kluyver
Thomas Kluyver added the comment: Thanks Carol :-) -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https:

[issue12486] tokenize module should have a unicode API

2018-06-05 Thread Carol Willing
Change by Carol Willing : -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker ___ ___

[issue12486] tokenize module should have a unicode API

2018-06-05 Thread Carol Willing
Carol Willing added the comment: New changeset c56b17bd8c7a3fd03859822246633d2c9586f8bd by Carol Willing (Thomas Kluyver) in branch 'master': bpo-12486: Document tokenize.generate_tokens() as public API (#6957) https://github.com/python/cpython/commit/c56b17bd8c7a3fd03859822246633d2c9586f8bd

[issue12486] tokenize module should have a unicode API

2018-05-28 Thread Thomas Kluyver
Thomas Kluyver added the comment: The tests on PR #6957 are passing now, if anyone has time to have a look. :-) -- ___ Python tracker ___ _

[issue12486] tokenize module should have a unicode API

2018-05-18 Thread Thomas Kluyver
Thomas Kluyver added the comment: Thanks - I had forgotten it, just fixed it now. -- ___ Python tracker ___ ___ Python-bugs-list mai

[issue12486] tokenize module should have a unicode API

2018-05-18 Thread Martin Panter
Martin Panter added the comment: Don’t forget about updating __all__. -- ___ Python tracker ___ ___ Python-bugs-list mailing list Un

[issue12486] tokenize module should have a unicode API

2018-05-18 Thread Thomas Kluyver
Thomas Kluyver added the comment: I agree, it's not a good design, but it's what's already there; I just want to ensure that it won't be removed without a deprecation cycle. My PR makes no changes to behaviour, only to documentation and tests. This and issue 9969 have both been around for sev

[issue12486] tokenize module should have a unicode API

2018-05-18 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: My concern is that we will have two functions with non-similar names (tokenize() and generate_tokens()) that does virtually the same, but accept different types of input (bytes or str), and the single function untokenize() that produces different type of re

[issue12486] tokenize module should have a unicode API

2018-05-18 Thread Thomas Kluyver
Thomas Kluyver added the comment: I wouldn't say it's a good name, but I think the advantage of documenting an existing name outweighs that. We can start (or continue) using generate_tokens() right away, whereas a new name presumably wouldn't be available until Python 3.8 comes out. And we us

[issue12486] tokenize module should have a unicode API

2018-05-18 Thread Serhiy Storchaka
Change by Serhiy Storchaka : -- nosy: +barry, mark.dickinson, michael.foord, trent versions: +Python 3.8 -Python 3.6 ___ Python tracker ___ _

[issue12486] tokenize module should have a unicode API

2018-05-18 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: The old generate_tokens() was renamed to tokenize() in issue719888 because the latter is a better name. Is "generate_tokens" considered a good name now? -- ___ Python tracker

[issue12486] tokenize module should have a unicode API

2018-05-17 Thread Thomas Kluyver
Change by Thomas Kluyver : -- pull_requests: +6616 ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://mai

[issue12486] tokenize module should have a unicode API

2018-05-17 Thread Matthias Bussonnier
Matthias Bussonnier added the comment: > Why not just bless the existing generate_tokens() function as a public API, Yes please, or just make the private `_tokenize` public under another name. The `tokenize.tokenize` method try to magically detect encoding which may be unnecessary.

[issue12486] tokenize module should have a unicode API

2018-03-11 Thread Thomas Kluyver
Thomas Kluyver added the comment: > Why not just bless the existing generate_tokens() function as a public API We're actually using generate_tokens() from IPython - we wanted a way to tokenize unicode strings, and although it's undocumented, it's been there for a number of releases and does w

[issue12486] tokenize module should have a unicode API

2015-10-05 Thread Martin Panter
Martin Panter added the comment: I didn’t notice that this dual untokenize() behaviour already existed. Taking that into account weakens my argument for having separate text and bytes tokenize() functions. -- ___ Python tracker

[issue12486] tokenize module should have a unicode API

2015-10-04 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : Added file: http://bugs.python.org/file40679/tokenize_str_2.diff ___ Python tracker ___ ___ Python-bugs-list mai

[issue12486] tokenize module should have a unicode API

2015-10-04 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Thank you for your review Martin. Here is rebased patch that addresses Matin's comments. I agree that having untokenize() changes its output type depending on the ENCODING token is bad design and we should change this. But this is perhaps other issue.

[issue12486] tokenize module should have a unicode API

2015-10-04 Thread Martin Panter
Martin Panter added the comment: I agree it would be very useful to be able to tokenize arbitrary text without worrying about encoding tokens. I left some suggestions for the documentation changes. Also some test cases for it would be good. However I wonder if a separate function would be bett

[issue12486] tokenize module should have a unicode API

2012-12-28 Thread Meador Inge
Meador Inge added the comment: See also issue9969. -- nosy: +meador.inge ___ Python tracker ___ ___ Python-bugs-list mailing list Unsu

[issue12486] tokenize module should have a unicode API

2012-10-15 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Patch to allow tokenize() accepts string is very simple, only 4 lines. But it requires a lot of documentation changes. Then we can get rid of undocumented generate_tokens(). Note, stdlib an tools use only generate_tokens(), none uses tokenize(). Of course, i

[issue12486] tokenize module should have a unicode API

2012-10-13 Thread Eric Snow
Changes by Eric Snow : ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bug

[issue12486] tokenize module should have a unicode API

2011-07-09 Thread Eric Snow
Changes by Eric Snow : -- nosy: +ericsnow ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.or

[issue12486] tokenize module should have a unicode API

2011-07-09 Thread STINNER Victor
STINNER Victor added the comment: The compiler has a PyCF_SOURCE_IS_UTF8 flag: see compile() builtin. The parser has a flag to ignore the coding cookie: PyPARSE_IGNORE_COOKIE. Patch tokenize to support Unicode is simple: use PyCF_SOURCE_IS_UTF8 and/or PyPARSE_IGNORE_COOKIE flags and encode th

[issue12486] tokenize module should have a unicode API

2011-07-08 Thread Terry J. Reedy
Terry J. Reedy added the comment: Hmm. Python 3 code is unicode. "Python reads program text as Unicode code points." The tokenize module purports to provide "a lexical scanner for Python source code". But it seems not to do that. Instead it provides a scanner for Python code encoded as bytes,

[issue12486] tokenize module should have a unicode API

2011-07-08 Thread Petri Lehtinen
Changes by Petri Lehtinen : -- nosy: +petri.lehtinen ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mai

[issue12486] tokenize module should have a unicode API

2011-07-04 Thread Éric Araujo
Changes by Éric Araujo : -- nosy: +eric.araujo, haypo type: -> feature request versions: +Python 3.3 ___ Python tracker ___ ___ Pytho

[issue12486] tokenize module should have a unicode API

2011-07-03 Thread Devin Jeanpierre
New submission from Devin Jeanpierre : tokenize only deals with bytes. Users might want to deal with unicode source (for example, if python source is embedded into a document with an already-known encoding). The naive approach might be something like: def my_readline(): return my_oldr