For the given test case, this pyparsing sample parses the data, without having to anticipate all the possible 2-letter keys.
from pyparsing import * integer = Word(nums) DASH = Literal('-').suppress() LT = Literal('<').suppress() GT = Literal('>').suppress() entrynum = LT + integer + GT keycode = Word(alphas.upper(),exact=2) key = GoToColumn(1).suppress() + keycode + DASH data = Group(key("key") + Empty() + SkipTo(key | entrynum | StringEnd())("value")) entry = entrynum("refnum") + OneOrMore(data)("data") for e in entry.searchString(test): print e.refnum for dd in e.data: print dd.key,':', dd.value print Prints: ['567'] ['AU'] : Bibliographical Theory and Practice - Volume 1 - The AU - Tag and its applications ['AB'] : Texts in Library Science ['568'] ['AU'] : Bibliographical Theory and Practice - Volume 2 - The ['AB'] : Tag and its applications ['AB'] : Texts in Library Science ['569'] ['AU'] : AU - AU - AU - AU - AU - AU - AU - AU - AU - AU - ['AU'] : AU - AU - AU - AU - AU - AU - AU - AU - AU - AU ['AB'] : AU - AU - AU - AU - AU - AU - AU - AU - AU - AU - ['AU'] : AU - AU - AU - AU - AU - AU - AU - AU - AU - AU ['ZZ'] : Somewhat nonsensical case If you find that you have to also accept keycodes that consist of a capital letter followed by a numeric digit (like "B7"), modify the keycode definition to be: keycode = Word(alphas.upper(), alphanums.upper(), exact=2) -- Paul _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor