Hi everyone, Thanks for all suggestions. Let me just preface this by saying that Im new to both python and programming. I started learning 3 months ago with online tutorials and reading the questions you guys post. So, thank you all very, very much and I apologize if Im doing something really stupid..:-) OK. Ive solved the problem of opening several files to process as a batch with glob.glob(). Only now did I realize that the program and files need to be in the same folder . Now I have another problem. 1- I want to open several files and count the total number of words. If I do this with only 1 file, it works great. With several files ( now with glob), it outputs the total count for each file individually and not the whole corpus (see comment in the program below). 2- I also want the program to output a word frequency list (we do this a lot in corpus linguistics). When I do this with only one file, the program works great (with a dictionary). With several files, I end up with several frequency lists, one for each file. This sounds like a loop type of problem, doesnt it? I looked at the indentations too and I cant find what the problem is. Your comments, suggestions, etc are greatly appreciated. Thanks again for all your help. Paulo Here goes what I have. # The program is intended to output a word frequency list (including all words in all files) and the total word count def sortfile(): # I created a function filename = glob.glob('*.txt') # this works great! Thanks! for allfiles in filename: infile = open(allfiles, 'r') lines = list(infile) infile.close() words = [] # initializes list of words wordcounter = 0 for line in lines: line = line.lower() # after this, I have some clunky code to get rid of punctuation words = words + line.split() wordfreq = [words.count(wrd)for wrd in words] # counts the freq of each word in a list dictionary = dict(zip(words, wordfreq)) frequency_list = [(dictionary[key], key)for key in dictionary] frequency_list.sort() frequency_list.reverse() for item in frequency_list: wordcounter = wordcounter + 1 print item print "Total # of words:", wordcounter # this will give the word count of the last file the program reads. print "Total # of words:", wordcounter # if I put it here, I get the total count after each file sortfile()
_______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor