Hi,
 
I'm learning Python so I can take advantage of the really cool stuff in the 
Natural Language Toolkit. But I'm having problems with some basic file 
manipulation stuff.
 
My basic question: How do I read data in from a csv, manipulate it, and then 
add it back to the csv in new columns (keeping the manipulated data in the 
"right row")?
 
Here's an example of what my data looks like ("test-8-29-10.csv"):
 



MyWord

Category

Ct

CatCt


!

A

2932

456454


!

B

2109

64451


a

C

7856

90000


a

A

19911

456454


abnormal

C

174

90000


abnormally

D

5

77777


cats

E

1999

886454


cat

B

160

64451



 
# I want to read in the MyWord for each row and then do some stuff to it and 
add in some new columns. Specifically, I want to "lemmatize" and "stem", which 
basically means I'll turn "abnormally" into "abnormal" and "cats" into "cat".
 
import nltk
wnl=nltk.WordNetLemmatizer()
porter=nltk.PorterStemmer()
text=nltk.word_tokenize(TheStuffInMyWordColumn)
textlemmatized=[wnl.lemmatize(t) for t in text]
textPort=[porter.stem(t) for t in text]
 
# This creates the right info, but I don't really want "textlemmatized" and 
"textPort" to be independent lists, I want them inside the csv in new columns. 
 
# If I didn't want to keep the information in the Category and Counts columns, 
I would probably do something like this:
 
for word in text:
word2=wnl.lemmatize(word)
word3=porter.stem(word)
print word+";"+word2+";"+word3+"\r\n")
 
# Looking through some of the older discussions about the csv module, I found 
this code helps identify headers, but I'm still not sure how to use them--or 
how to word the for-loop that I need correctly so I iterate through each row in 
the csv file. 
 
f_out.close()
fp=open(r'c:test-8-29-10.csv', 'r')
inputfile=csv.DictReader(fp)
for record in inputfile:
print record
{'Category': 'A', 'CatCt': '456454', 'MyWord': '!', 'Ct': '2932'}
{'Category': 'B', 'CatCt': '64451', 'MyWord': '!', 'Ct': '2109'}
...
fp.close() 
 
# So I feel like I have *some* of the pieces, but I'm just missing a bunch of 
little connections. Any and all help would be much appreciated!
 
Tyler
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to