Re: [Tutor] Replacing fields in lines of various lengths

Alan Gauld Tue, 05 May 2009 02:44:11 -0700


"Dan Liang" <danlian...@gmail.com> wrote

And I put together the code below based on your suggestions, with minor
changes and it does work.


Good, now your question is?


-------------Begin code----------------------------

#!usr/bin/python
tags = {
'noun-prop': 'noun_prop null null'.split(),
'case_def_gen': 'case_def gen null'.split(),
'dem_pron_f': 'dem_pron f null'.split(),
'case_def_acc': 'case_def acc null'.split(),
}


TAB = '\t'


def newlyTaggedWord(line):
      line = line.rstrip()     # I strip line ending
      (word,tag) = line.split(TAB)    # separate parts of line, keeping
data only
      new_tags = tags[tag]          # read in dict
      tagging = TAB.join(new_tags)    # join with TABs
      return word + TAB + tagging   # formatted result

def replaceTagging(source_name, target_name):
      target_file = open(target_name, "w")
      # replacement loop
      for line in open(source_name, "r"):
          new_line = newlyTaggedWord(line) + '\n'
          target_file.write(new_line)

source_name.close()
target_file.close()

AG> These two lines should be inside the function, after the loop.


if __name__ == "__main__":
      source_name = sys.argv[1]
      target_name = sys.argv[2]
      replaceTagging(source_name, target_name)

-------------End code----------------------------


Now since I have to workon different data format as follows:

-------------Begin data----------------------------

w1    \t   case_def_acc   \t          yes
w2‬    \t   noun_prop   \t               no
‭w3‬    \t   case_def_gen   \t
w4    \t   dem_pron_f   \t             no
w3‬    \t   case_def_gen   \t
w4    \t   dem_pron_f   \t             no
w1    \t   case_def_acc   \t          yes
w3‬    \t   case_def_gen   \t
w3‬    \t   case_def_gen   \t

-------------End data----------------------------
Notices that some lines have nothing in yes-no filed, and hence end in a
tab.

My question is how to replace data in the filed of composite tags by
sub-tags like those in the dictionary values above and still be able to
print the whole line only with this change (i.e, composite tags replace by
sub-tags). Earlier, we read words and tags from line directly into the
dictionary since we were sure each line had 2 fields after separating by
tabs. Here, lines have various field lengths and sometimes have yes and no
finally, and sometimes not.

I tried to make changes to the code above by changing the function wherewe

read the dictionary, but it did not work. While it is ugly, I include it as
a proof that I have worked on the problem. I am sure you will have various
nice ideas.


-------------End code----------------------------
def newlyTaggedWord(line):
      tagging = ""

line = line.split(TAB) # separate parts of line, keeping dataonly

      if len(line)==3:
          word = line[-3]
          tag = line[-2]
          new_tags = tags[tag]
          decision = line[-1]

# in decision I wanted to store #either yes or no if one of #these existed

      elif len(line)==2:
          word = line[-2]
          tag = line[-1]
          decision = TAB

# I thought if it is a must to put sth in decision while decision #isreally

absent in line, I would put a tab. But I really want to #avoid putting
anything there.

          new_tags = tags[tag]          # read in dict
          tagging = TAB.join(new_tags)    # join with TABs
          return word + TAB + tagging + TAB + decision
-------------End code----------------------------


I appreciate your support!

--dan



--------------------------------------------------------------------------------

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor



_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Replacing fields in lines of various lengths

Reply via email to