Re: [Tutor] Tokenizing Help

2009-04-23 Thread Paul McGuire
For the given test case, this pyparsing sample parses the data, without having to anticipate all the possible 2-letter keys. from pyparsing import * integer = Word(nums) DASH = Literal('-').suppress() LT = Literal('<').suppress() GT = Literal('>').suppress() entrynum = LT + integer + GT keycod

Re: [Tutor] Tokenizing Help

2009-04-22 Thread Kent Johnson
On Wed, Apr 22, 2009 at 9:41 PM, William Witteman wrote: > On Wed, Apr 22, 2009 at 11:23:11PM +0200, Eike Welk wrote: > >>How do you decide that a word is a keyword (AU, AB, UN) and not a part >>of the text? There could be a file like this: >> >><567> >>AU  - Bibliographical Theory and Practice -

Re: [Tutor] Tokenizing Help

2009-04-22 Thread William Witteman
On Wed, Apr 22, 2009 at 11:23:11PM +0200, Eike Welk wrote: >How do you decide that a word is a keyword (AU, AB, UN) and not a part >of the text? There could be a file like this: > ><567> >AU - Bibliographical Theory and Practice - Volume 1 - The AU - Tag >and its applications >AB - Texts in

Re: [Tutor] Tokenizing Help

2009-04-22 Thread William Witteman
On Wed, Apr 22, 2009 at 05:16:56PM -0400, bob gailer wrote: >> <1> # the references are enumerated >> AU - some text >> perhaps across lines >> AB - some other text >> AB - there may be multiples of some fields >> UN - any 2-letter combination may exist, other than by exhausti

Re: [Tutor] Tokenizing Help

2009-04-22 Thread Eike Welk
Hello William! On Wednesday 22 April 2009, William Witteman wrote: > The file format I am looking at (it is a bibliographic reference > file) looks like this: > > <1> # the references are enumerated > AU - some text > perhaps across lines > AB - some other text > AB - there ma

Re: [Tutor] Tokenizing Help

2009-04-22 Thread bob gailer
William Witteman wrote: I need to be able to decompose a formatted text file into identifiable, possibly named pieces. To tokenize it, in other words. There seem to be a vast array of modules to do this with (simpleparse, pyparsing etc) but I cannot understand their documentation. The file for

Re: [Tutor] Tokenizing Help

2009-04-22 Thread William Witteman
On Wed, Apr 22, 2009 at 09:23:30PM +0200, spir wrote: >> I need to be able to decompose a formatted text file into identifiable, >> possibly named pieces. To tokenize it, in other words. There seem to >> be a vast array of modules to do this with (simpleparse, pyparsing etc) >> but I cannot unde

Re: [Tutor] Tokenizing Help

2009-04-22 Thread spir
Le Wed, 22 Apr 2009 14:35:29 -0400, William Witteman s'exprima ainsi: > I need to be able to decompose a formatted text file into identifiable, > possibly named pieces. To tokenize it, in other words. There seem to > be a vast array of modules to do this with (simpleparse, pyparsing etc) > but

[Tutor] Tokenizing Help

2009-04-22 Thread William Witteman
I need to be able to decompose a formatted text file into identifiable, possibly named pieces. To tokenize it, in other words. There seem to be a vast array of modules to do this with (simpleparse, pyparsing etc) but I cannot understand their documentation. The file format I am looking at (it is