On Fri, Feb 27, 2009 at 2:59 AM, wesley chun <wes...@gmail.com> wrote:
> > There is a text file that looks like this: > > > > text text text <ID>Joseph</text text text> > > text text text text text text text text text text text > > text text text text text text text text text text text > > text text text text text text text text text text text > > text text text text text text text text text text text > > text text text text text text text text text text text > > text text text text text text text text text text text > > text text text text text text text text text text text > > text text text <Full name> Joseph Smith</text text text> > > text text text <Rights> 1</text text text> > > text text text <LDAP> 0</text text text> > > > > What I am trying to do is: > > > > 1. I need to extract the name and the full name from this text file. For > > example: ( ID is Joseph & Full name is Joseph Smith). > > > in addition to denis' suggestion of using regular expressions, you can > also look at the xml.etree module and have ElementTree parse them into > tags for you, so all you have to do is ask for the ID and "Full name" > tags to get your data. > > good luck! > -- wesley > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > "Core Python Programming", Prentice Hall, (c)2007,2001 > "Python Fundamentals", Prentice Hall, (c)2009 > http://corepython.com > > wesley.j.chun :: wescpy-at-gmail.com > python training and technical consulting > cyberweb.consulting : silicon valley, ca > http://cyberwebconsulting.com > _______________________________________________ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > Since I'm learning Pyparsing, this was a nice excercise. I've written this elementary script which does the job well in light of the data we have from pyparsing import * ID_TAG = Literal("<ID>") FULL_NAME_TAG1 = Literal("<Full") FULL_NAME_TAG2 = Literal("name>") END_TAG = Literal("</") word = Word(alphas) pattern1 = ID_TAG + word + END_TAG pattern2 = FULL_NAME_TAG1 + FULL_NAME_TAG2 + OneOrMore(word) + END_TAG result = pattern1 | pattern2 lines = open("lines.txt")# This is your file name for line in lines: myresult = result.searchString(line) if myresult: print myresult[0] # This prints out ['<ID>', 'Joseph', '</'] ['<Full', 'name>', 'Joseph', 'Smith', '</'] # You can access the individual elements of the lists to pick whatever you want -- لا أعرف مظلوما تواطأ الناس علي هضمه ولا زهدوا في إنصافه كالحقيقة.....محمد الغزالي "No victim has ever been more repressed and alienated than the truth" Emad Soliman Nawfal Indiana University, Bloomington http://emnawfal.googlepages.com --------------------------------------------------------
_______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor