Hi everyone,
  Thanks for all suggestions. Let me just preface this by saying that I’m new 
to both python and programming. I started learning 3 months ago with online 
tutorials and reading the questions you guys post. So, thank you all very, very 
much…and I apologize if I’m doing something really stupid..:-)                  
                                                                                
                                                          OK. I’ve solved the 
problem of opening several files to process “as a batch” with glob.glob(). Only 
now did I realize that the program and files need to be in the same folder…. 
Now I have another problem. 
  1- I want to open several files and count the total number of words. If I do 
this with only 1 file, it works great. With several files ( now with glob), it 
outputs the total count for each file individually and not the whole corpus 
(see comment in the program below).
  2- I also want the program to output a word frequency list (we do this a lot 
in corpus linguistics). When I do this with only one file, the program works 
great (with a dictionary). With several files, I end up with several frequency 
lists, one for each file. This sounds like a loop type of problem, doesn’t it? 
I looked at the indentations too and  I can’t find what the problem is. Your 
comments, suggestions, etc are greatly appreciated. Thanks again for all your 
help. Paulo   
  Here goes what I have.
  # The program is intended to output a word frequency list (including all 
words in all files) and the total word count 
  def sortfile():  # I created a function
      filename = glob.glob('*.txt') # this works great! Thanks!
      for allfiles in filename:
          infile = open(allfiles, 'r')
          lines = list(infile)
          infile.close()
          words = [] # initializes list of words
          wordcounter = 0
          for line in lines: 
              line = line.lower()  # after this, I have some clunky code to get 
rid of punctuation
              words = words + line.split() 
          wordfreq = [words.count(wrd)for wrd in words] # counts the freq of 
each word in a list
          dictionary = dict(zip(words, wordfreq))
          frequency_list = [(dictionary[key], key)for key in dictionary] 
          frequency_list.sort()
          frequency_list.reverse()
          for item in frequency_list:
              wordcounter = wordcounter + 1
              print item
      print "Total # of words:", wordcounter #  this will give the word count 
of the last file the program reads. 
             print "Total # of words:", wordcounter        # if I put it here, 
I get the total count after each file                          
  sortfile() 
   
_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to