Martin, Alan, col speed and everybody that helped: I think I'm going to stop because I'm repeating myself but it is difficult for me not to be profuse in my thanks because you guys really go beyond the call of duty. I love this list. The responses in this list most of the times don't just address the problem at hand but are also useful in a more general sense and help people become better programmers. So, thanks for all the good advice as well as helping me solve the particular problem I had.
Let me address some particular points you've made: On Sun, Nov 21, 2010 at 12:01 AM, Martin A. Brown <mar...@linux-ip.net> wrote: > : It turns out that matters of efficiency appear to be VERY > : important in this case. The example in my message was a very <snip> > Efficiency is best addressed first and foremost, not by hardware, > but by choosing the correct data structure and algorithm for > processing the data. You have more than enough hardware to deal > with this problem, Yes indeed. Now that I fixed the code following your advice and Alan's, it took a few seconds for the script to run and yield the desired results. Big sigh of relief: my investment in a powerful computer was not in vain. <snip> > This is far afield from the question of word count, but may be > useful someday. > > The beauty of a multiple processors is that you can run independent > processes simultaneously (I'm not talking about multitasking). <snip> > http://docs.python.org/library/threading.html > http://www.devshed.com/c/a/Python/Basic-Threading-in-Python/ > http://www.dabeaz.com/python/GIL.pdf VERY useful information, thanks! > OK, on to your code. > > : def countWords(wordlist): > : word_table = {} > : for word in wordlist: > : count = wordlist.count(word) > : print "word_table[%s] = %s" % (word,word_table.get(word,'<none>')) > : word_table[word] = count > > Problem 1: You aren't returning anything from this function. > Add: > return word_table Sorry, since I had a lot of comments on my code (I'm learning and I want to document profusely everything I do so that I don't have to reinvent the wheel every time I try to do something) and before posting it here I did a lot of deleting. Unintentionally I deleted the following line (suggested in Steve's original message) that contained the return: return sorted(word_table.items(), key=lambda item: item[1], reverse=True) Even adding this, though, the process was taking too long and I had to kill it. When I fixed my mistake in Peter Otten's code (see below) everything worked like a charm. By the way, I know what a lambda function is and I read about the key parameter in sorted() but I don't understand very well what "key=lambda item: item[1]" does. It has to do with taking the value '1' as term for comparison, I guess, since this returns an ordered list according to the number of times a word appears in the text going from the most frequent to the less frequent and reverse=True is what changes the order in which is sorted. What I don't understand is the syntax of "item : item[1]". <snip> > : def countWords2(wordlist): #as proposed by Peter Otten > : word_table = {} > : for word in wordlist: > : if word in word_table: > : word_table[word] += 1 > : else: > : word_table[word] = 1 > : count = wordlist.count(word) > : word_table[word] = count > : return sorted( > : word_table.items(), key=lambda item: item[1], > reverse=True > : ) > > In the above, countWords2, why not omit these lines: > > : count = wordlist.count(word) > : word_table[word] = count Sorry this was my mistake and it is what was responsible for the script hanging. This is the problem with cutting and pasting code and not revising what you copied. I took Steve's code as the basis and tried to modify it with Peter's code but then I forgot to delete these two lines that were in Steve's code. Since it worked with the test I did with the light file, I didn't even worry to check it. Live and learn. <snip> > Let try a (bit of a labored) analogy of your problem. To > approximate your algorithm. > > I have a clear tube with gumballs of a variety of colors. > I open up one end of the tube, and mark where I'm starting. <snip> This is what I said at the beginning. This little analogy was pedagogically very sound. Thanks! I really appreciate (and I hope others will do as well) your time. <snip> > Once you gain familiarity with the lists and dicts, you can try out > collections, as suggested by Peter Otten. The problem is that I'm using version 2.6.1. I have a Mac and I am using a package called NLTK to process natural language. I tried to install newer versions of Python on the Mac but the result was a mess. The modules of NLTK worked well with the default Python installation but not with the newer versions I installed. They recommend not to delete the default version of Python in the Mac because it might be used by the system or some applications. So I had to go back to the Python version that comes installed by default in the Mac. Josep M. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor