Hi Spir and tutors, Thank you Spir for your response. I went ahead and tried your code after adding a couple of dictionary entries, as below: -----------Code Begins--------------- #!usr/bin/python
tags = { 'case_def_gen':['case_def','gen','null'], 'nsuff_fem_pl':['nsuff','null', 'null'], 'abbrev': ['abbrev, null, null'], 'adj': ['adj, null, null'], 'adv': ['adv, null, null'],} # tag dict TAB = '\t' def newlyTaggedWord(line): (word,tag) = line.split(TAB) # separate parts of line, keeping data only new_tags = tags['tag'] # read in dict--Index by string tagging = TAB.join(new_tags) # join with TABs return word + TAB + tagging # formatted result def replaceTagging(source_name, target_name): source_file = file(source_name, 'r') source = source_file.read() # not really necessary target_file = open(target_name, "w") # replacement loop for line in source: new_line = newlyTaggedWord(line) + '\n' target_file.write(new_line) source_file.close() target_file.close() if __name__ == "__main__": source_name = sys.argv[1] target_name = sys.argv[2] replaceTagging(source_name, target_name) -----------Code Ends--------------- The file I am working on looks like this: word \t case_def_gen word \t nsuff_fem_pl word \t adj word \t abbrev word \t adv I get the following error when I try to run it, and I cannot figure out where the problem lies: -----------Error Begins--------------- Traceback (most recent call last): File "tag.formatter.py", line 36, in ? replaceTagging(source_name, target_name) File "tag.formatter.py", line 28, in replaceTagging new_line = newlyTaggedWord(line) + '\n' File "tag.formatter.py", line 16, in newlyTaggedWord (word,tag) = line.split(TAB) # separate parts of line, keeping data only ValueError: unpack list of wrong size -----------Error Ends--------------- Any ideas? Thank you! --dan From: Dan Liang <danlian...@gmail.com> Subject: [Tutor] Iterating over a long list with regular expressions and changing each item? To: tutor@python.org Message-ID: <a0e59afb0905031859k1d54bddck91955eb5b90ae...@mail.gmail.com > > > > Content-Type: text/plain; charset="iso-8859-1" > > Hi tutors, > > I am working on a file and need to replace each occurrence of a certain > label (part of speech tag in this case) by a number of sub-labels. The file > has the following format: > > word1 \t Tag1 > word2 \t Tag2 > word3 \t Tag3 > > Now the tags are complex and I wanted to split them in a tab-delimited > fashion to have this: > > word1 \t Tag1Part1 \t Tag2Part2 \t Tag3Part3 > > I searched online for some solution and found the code below which uses a > dictionary to store the tags that I want to replace in keys and the > sub-tags > as values. The problem with this is that it sometimes replaces tags that > are > not surrounded by spaces, which I do not like to happen. Also, I wanted > each > new sub-tag to be followed by a tab, so that the new items that I end up > having in my file are tab-delimited. For this, I put tabs between the items > of each key in the dictionary. I started thinking that this will not be the > best solution of the problem and perhaps a script that uses regular > expressions would be better. Since I am new to Python, I thought I should > ask you for your thoughts for a best solution. The items I want to replace > are about 150 and I did not know how to iterate over them with regular > expressions. Below is my previous code: > > > #!usr/bin/python > > import re, sys > f = file(sys.argv[1]) > readed= f.read() > > def replace_words(text, word_dic): > for k, v in word_dic.iteritems(): > text = text.replace(k, v) > return text > > # the dictionary has target_word:replacement_word pairs > > word_dic = { > 'abbrev': 'abbrev null null', > 'adj': 'adj null null', > 'adv': 'adv null null', > 'case_def_acc': 'case_def acc null', > 'case_def_gen': 'case_def gen null', > 'case_def_nom': 'case_def nom null', > 'case_indef_acc': 'case_indef acc null', > 'verb_part': 'verb_part null null'} > > > # call the function and get the changed text > > myString = replace_words(readed, word_dic) > > > fout = open(sys.argv[2], "w") > fout.write(myString) > fout.close() > > --dan > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://mail.python.org/pipermail/tutor/attachments/20090503/bd82a183/attachment-0001.htm > > > > ------------------------------
_______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor