[Python-Dev] tokenize string literal problem

C or L Smith Fri, 23 Oct 2009 23:13:18 -0700

BACKGROUND
    I'm trying to modify the doctest DocTestParser so it will parse docstring 
code snippets out of a *py file. (Although doctest can parse these with another 
method out of *pyc, it is missing certain decorated functions and we would also 
like to insist of import of needed modules rather and that method automatically 
loads everything from the module containing the code.)


PROBLEM
    I need to find code snippets which are located in docstrings. Docstrings, 
being string literals should be able to be parsed out with tokenize. But 
tokenize is giving the wrong results (or I am doing something wrong) for this 
(pathological) case:

foo.py:
+----
def bar():
    """
    A quoted triple quote is not a closing
    of this docstring:
    >>> print '"""'
    """
    """ # <-- this is the closing quote
    pass
+----

Here is how I tokenize the file:

###
import re, tokenize
DOCSTRING_START_RE = re.compile('\s+[ru]*("""|' + "''')")

o=open('foo.py','r')
for ti in tokenize.generate_tokens(o.next):
    typ = ti[0]
    text = ti[-1]
    if typ == tokenize.STRING:
        if DOCSTRING_START_RE.match(text):
            print "DOCSTRING:",repr(text)
o.close()
###

which outputs:

DOCSTRING: '    """\n    A quoted triple quote is not a closing\n    of this 
docstring:\n    >>> print \'"""\'\n'
DOCSTRING: '    """\n    """ # <-- this is the closing quote\n'

There should be only one string tokenized, I believe. The PythonWin editor 
parses (and colorizes) this correctly, but tokenize (or I) are making an error.

Thanks for any help,
Chris
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] tokenize string literal problem

Reply via email to