[Tutor] Tokenizing Help

William Witteman Wed, 22 Apr 2009 11:36:17 -0700

I need to be able to decompose a formatted text file into identifiable,
possibly named pieces.  To tokenize it, in other words.  There seem to
be a vast array of modules to do this with (simpleparse, pyparsing etc)
but I cannot understand their documentation.


The file format I am looking at (it is a bibliographic reference file)
looks like this:

<1>                   # the references are enumerated
AU  - some text
perhaps across lines
AB  - some other text
AB  - there may be multiples of some fields
UN  - any 2-letter combination may exist, other than by exhaustion, I
cannot anticipate what will be found

What I am looking for is some help to get started, either with
explaining the implementation of one of the modules with respect to my
format, or with an approach that I could use from the base library.

Thanks.
-- 

yours,

William

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Tokenizing Help

Reply via email to