I need to be able to decompose a formatted text file into identifiable, possibly named pieces. To tokenize it, in other words. There seem to be a vast array of modules to do this with (simpleparse, pyparsing etc) but I cannot understand their documentation.
The file format I am looking at (it is a bibliographic reference file) looks like this: <1> # the references are enumerated AU - some text perhaps across lines AB - some other text AB - there may be multiples of some fields UN - any 2-letter combination may exist, other than by exhaustion, I cannot anticipate what will be found What I am looking for is some help to get started, either with explaining the implementation of one of the modules with respect to my format, or with an approach that I could use from the base library. Thanks. -- yours, William _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor