[Python-Dev] tokenize string literal problem

2009-10-23 Thread C or L Smith
BACKGROUND
I'm trying to modify the doctest DocTestParser so it will parse docstring 
code snippets out of a *py file. (Although doctest can parse these with another 
method out of *pyc, it is missing certain decorated functions and we would also 
like to insist of import of needed modules rather and that method automatically 
loads everything from the module containing the code.)

PROBLEM
I need to find code snippets which are located in docstrings. Docstrings, 
being string literals should be able to be parsed out with tokenize. But 
tokenize is giving the wrong results (or I am doing something wrong) for this 
(pathological) case:

foo.py:
+
def bar():
"""
A quoted triple quote is not a closing
of this docstring:
>>> print '"""'
"""
""" # <-- this is the closing quote
pass
+

Here is how I tokenize the file:

###
import re, tokenize
DOCSTRING_START_RE = re.compile('\s+[ru]*("""|' + "''')")

o=open('foo.py','r')
for ti in tokenize.generate_tokens(o.next):
typ = ti[0]
text = ti[-1]
if typ == tokenize.STRING:
if DOCSTRING_START_RE.match(text):
print "DOCSTRING:",repr(text)
o.close()
###

which outputs:

DOCSTRING: '"""\nA quoted triple quote is not a closing\nof this 
docstring:\n>>> print \'"""\'\n'
DOCSTRING: '"""\n""" # <-- this is the closing quote\n'

There should be only one string tokenized, I believe. The PythonWin editor 
parses (and colorizes) this correctly, but tokenize (or I) are making an error.

Thanks for any help,
Chris
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] tokenize string literal problem

2009-10-23 Thread C or L Smith
C or L Smith wrote:
> PROBLEM
> I need to find code snippets which are located in docstrings.
> Docstrings, being string literals should be able to be parsed out
> with tokenize. But tokenize is giving the wrong results (or I am
> doing something wrong) for this (pathological) case:   
> 
> foo.py:
> +
> def bar():
> """
> A quoted triple quote is not a closing
> of this docstring:
> >>> print '"""'
> """
> """ # <-- this is the closing quote
> pass
> +
> 

I now see that I've created a code snippet that is invalid. Myopia. The thing 
that pythonWin was doing correctly was displaying my sample STRING not code. I 
had delimited the code with triple-single-quotes so it showed up correctly. In 
fact, if entered as code it would show the need to delimit the docstring 
contents with ''' rather than """.

Sorry!
/c
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com