For the given test case, this pyparsing sample parses the data, without
having to anticipate all the possible 2-letter keys.
from pyparsing import *
integer = Word(nums)
DASH = Literal('-').suppress()
LT = Literal('<').suppress()
GT = Literal('>').suppress()
entrynum = LT + integer + GT
keycod
On Wed, Apr 22, 2009 at 9:41 PM, William Witteman wrote:
> On Wed, Apr 22, 2009 at 11:23:11PM +0200, Eike Welk wrote:
>
>>How do you decide that a word is a keyword (AU, AB, UN) and not a part
>>of the text? There could be a file like this:
>>
>><567>
>>AU - Bibliographical Theory and Practice -
On Wed, Apr 22, 2009 at 11:23:11PM +0200, Eike Welk wrote:
>How do you decide that a word is a keyword (AU, AB, UN) and not a part
>of the text? There could be a file like this:
>
><567>
>AU - Bibliographical Theory and Practice - Volume 1 - The AU - Tag
>and its applications
>AB - Texts in
On Wed, Apr 22, 2009 at 05:16:56PM -0400, bob gailer wrote:
>> <1> # the references are enumerated
>> AU - some text
>> perhaps across lines
>> AB - some other text
>> AB - there may be multiples of some fields
>> UN - any 2-letter combination may exist, other than by exhausti
Hello William!
On Wednesday 22 April 2009, William Witteman wrote:
> The file format I am looking at (it is a bibliographic reference
> file) looks like this:
>
> <1> # the references are enumerated
> AU - some text
> perhaps across lines
> AB - some other text
> AB - there ma
William Witteman wrote:
I need to be able to decompose a formatted text file into identifiable,
possibly named pieces. To tokenize it, in other words. There seem to
be a vast array of modules to do this with (simpleparse, pyparsing etc)
but I cannot understand their documentation.
The file for
On Wed, Apr 22, 2009 at 09:23:30PM +0200, spir wrote:
>> I need to be able to decompose a formatted text file into identifiable,
>> possibly named pieces. To tokenize it, in other words. There seem to
>> be a vast array of modules to do this with (simpleparse, pyparsing etc)
>> but I cannot unde
Le Wed, 22 Apr 2009 14:35:29 -0400,
William Witteman s'exprima ainsi:
> I need to be able to decompose a formatted text file into identifiable,
> possibly named pieces. To tokenize it, in other words. There seem to
> be a vast array of modules to do this with (simpleparse, pyparsing etc)
> but
I need to be able to decompose a formatted text file into identifiable,
possibly named pieces. To tokenize it, in other words. There seem to
be a vast array of modules to do this with (simpleparse, pyparsing etc)
but I cannot understand their documentation.
The file format I am looking at (it is