Le Thu, 26 Feb 2009 21:53:43 -0800, Mohamed Hassan <linuxlove...@gmail.com> s'exprima ainsi:
> Hi all, > > I am new to Python and still trying to figure out some things. Here is the > situation: > > There is a text file that looks like this: > > text text text <ID>Joseph</text text text> > text text text text text text text text text text text > text text text text text text text text text text text > text text text text text text text text text text text > text text text text text text text text text text text > text text text text text text text text text text text > text text text text text text text text text text text > text text text text text text text text text text text > text text text <Full name> Joseph Smith</text text text> > text text text <Rights> 1</text text text> > text text text <LDAP> 0</text text text> > > > This text file is very long, however all the entries in it looks the same at > the above. > > What I am trying to do is: > > 1. I need to extract the name and the full name from this text file. For > example: ( ID is Joseph & Full name is Joseph Smith). > > > - I am thinking I need to write something that will check the whole text > file line by line which I have done already. > - Now what I am trying to figure out is : How can I write a function that > will check to see if the line contains the word ID between < > then copy the > letters after > until > and dump it to a text file. > > Can somebody help please. I know this might soudn easy for some people, but > again I am new to Python and still figuring out things. > > Thank you This is a typical text parsing job. There are tools for that. However, probably we would need a bit more information about the real text structure, and first of all what you wish to do with it later, to point you to the most appropriate tool. I guess that there is a higher level structure that nests IDs, names, rights etc in a section and that you will need to keep them together for further process. Anyway for a startup exploration you can use regular expressions (regex) to extract individual data item. For instance: from re import compile as Pattern pattern = Pattern(r""".*<ID>(.+)<.+>.*""") line = "text text text <ID>Joseph</text text text>" print pattern.findall(line) text = """\ text text text <ID>Joseph</text text text> text text text <ID>Jodia</text text text> text text text <ID>Joobawap</text text text> """ print pattern.findall(text) ==> ['Joseph'] ['Joseph', 'Jodia', 'Joobawap'] There is a nice tutorial on regexes somewhere (you will easily find). Key points on this example are: r""".*<ID>(.+)<.+>.*""" * the pattern between """...""" expresses the overall format to be matched * all what is between (...) will be extracted by findall * '.' mean 'any character'; '*' means zero or more of what is just before; '+' mean one or more of what is just before. So the pattern will look for chains that contains a sequence formed of: 1. possible start chars 2. <ID> literally 3. one or more chars -- to return 4. something between <...> 5. possible end chars Denis ------ la vita e estrany _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor