Le Wed, 22 Apr 2009 14:35:29 -0400, William Witteman <y...@nerd.cx> s'exprima ainsi:
> I need to be able to decompose a formatted text file into identifiable, > possibly named pieces. To tokenize it, in other words. There seem to > be a vast array of modules to do this with (simpleparse, pyparsing etc) > but I cannot understand their documentation. I would recommand pyparsing, but this is an opinion. > The file format I am looking at (it is a bibliographic reference file) > looks like this: > > <1> # the references are enumerated > AU - some text > perhaps across lines > AB - some other text > AB - there may be multiples of some fields > UN - any 2-letter combination may exist, other than by exhaustion, I > cannot anticipate what will be found Regular expressions may be enough, depending on your actual needs. > What I am looking for is some help to get started, either with > explaining the implementation of one of the modules with respect to my > format, or with an approach that I could use from the base library. The question is: what do you need from the data? What do you expect as result? The best is to provide an example of result matching sample data. E.G. I wish as result a dictionary looking like { 'AU': 'some text\nperhaps across lines' 'AB': ['some other text', 'there may be multiples of some fields'] 'UN': 'any 2-letter combination may exist...' ... } >From this depends the choice of an appropriate tool and hints on possible >algorithms. > Thanks. Denis ------ la vita e estrany _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor