Le Tue, 5 May 2009 00:22:45 -0400, Dan Liang <danlian...@gmail.com> s'exprima ainsi:
> -------------Begin data---------------------------- > > w1 \t case_def_acc \t yes > w2 \t noun_prop \t no > w3 \t case_def_gen \t > w4 \t dem_pron_f \t no > w3 \t case_def_gen \t > w4 \t dem_pron_f \t no > w1 \t case_def_acc \t yes > w3 \t case_def_gen \t > w3 \t case_def_gen \t > > -------------End data---------------------------- > I tried to make changes to the code above by changing the function where we > read the dictionary, but it did not work. While it is ugly, I include it as > a proof that I have worked on the problem. I am sure you will have various > nice ideas. > > > -------------End code---------------------------- > def newlyTaggedWord(line): > tagging = "" > line = line.split(TAB) # separate parts of line, keeping data only > if len(line)==3: > word = line[-3] > tag = line[-2] > new_tags = tags[tag] > decision = line[-1] > > # in decision I wanted to store #either yes or no if one of #these existed > > elif len(line)==2: > word = line[-2] > tag = line[-1] > decision = TAB > > # I thought if it is a must to put sth in decision while decision #is really > absent in line, I would put a tab. But I really want to #avoid putting > anything there. > > new_tags = tags[tag] # read in dict > tagging = TAB.join(new_tags) # join with TABs > return word + TAB + tagging + TAB + decision > -------------End code---------------------------- > For simplicity, it would be cool if file would have some placeholder in place of absent yes/no 'decisions' so that you know there are always 3 fields. That's what would be cool with most languages. But python is rather flexible and clever for such border cases. Watch the example below: s1, s2 = "1\t2\t3", "1\t2\t" items1, items2 = s1.split('\t'), s2.split('\t') print items1, items2 ==> ['1', '2', '3'] ['1', '2', ''] So that you always have 3 items, the 3rd one maybe the empty string. Right? This means: * You can safely write "(word,tag,decision) = line.split(TAB)" [Beware of misleading naming like "line = line.split(TAB)", for after this the name 'line' actually refers to field values.] * You can have a single process. * The elif branch in you code above will never run, i guess ;-) [place a print instruction inside to check that] Denis Ps: I noticed that in your final version for the case of files with 2 fields only, you misplaced the file closings. They fit better in the func. ------ la vita e estrany _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor