On Tue, Sep 14, 2010 at 10:29 AM, Roelof Wobben <rwob...@hotmail.com> wrote:
I offer my solution. I didn't bother to make every word lower case, and I think that would improve the result Please offer critique, improvements Some explaination: line 5 -- I read the complete text into full_text, while first replacing -- with a space line 7 -- I split the full text string into words lines 8 - 15 -- Word by word I strip all sorts of characters that aren't in words from the front and back of each 'word' lines 11 - 14 -- this is EAFP -- try to add one to the bin with that word, if no such bin, make it and give it 1 lines 16, 17 -- since dicts don't sort, sort on the keys then loop thru the keys to print out the key (word) and the count > ---------------------------------------- > 1 #! /usr/bin/env python 2 3 word_count = {} 4 file = open ('alice_in_wonderland.txt', 'r') 5 full_text = file.read().replace('--',' ') 6 7 full_text_words = full_text.split() 8 for words in full_text_words: 9 stripped_words = words.strip(".,!?'`\"- ();:") 10 ##print stripped_words 11 try: 12 word_count[stripped_words] += 1 13 except KeyError: 14 word_count[stripped_words] = 1 15 16 ordered_keys = word_count.keys() 17 ordered_keys.sort() 18 ##print ordered_keys 19 print "All the words and their frequency in 'alice in wonderland'" 20 for k in ordered_keys: 21 print k, word_count[k] 22 -- Joel Goldstick
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor