Script is at: <http://www.rcblue.com/Python/wordFrequency/wordFrequency.txt>
Example text file for input: <http://www.rcblue.com/Python/wordFrequency/first3000linesOfDavidCopperfield.txt> (142 kb) (from <http://www.gutenberg.org/etext/766>) Example output in file: <http://www.rcblue.com/Python/wordFrequency/outputToFile.txt> (40 kb) (Execution took about 30 sec. with my computer.) I worked on this a LONG time for something I expected to just be an easy and possibly useful exercise. Three times I started completely over with a new approach. Had a lot of trouble removing exactly the characters I didn't want to appear in the output. Wished I knew how to debug other than just by using a lot of print statements. Specifically, I'm hoping for comments on or help with: 1) How to debug. I'm using v2.4, IDLE on Win XP. 2) I've tried to put in remarks that will help most anyone to understand what the code is doing. Have I succeeded? 3) No modularization. Couldn't see a reason to do so. Is there one or two? Specifically, what sections should become modules, if any? 4) Variable names. I gave up on making them self-explanatory. Instead, I put in some remarks near the top of the script (lines 6-10) that I hope do the job. Do they? In the code, does the "L to newL to L to newL to L" kind of thing remain puzzling? (lines 6-10) # meaning of short variable names: # S is a string # c is a character of a string # L, F are lists # e is an element of a list 5) Ideally, abbreviations that end in a period, such as U.N., e.g., i.e., viz. op. cit., Mr. (Am. E.), etc., should not be stripped of their final periods (whereas other words that end a sentence SHOULD be stripped). I tried making and using a Python list of these, but it was too tough to write the code to use it. Any ideas? (I can live very easily without a solution to point 5, because if the output shows there are 10 "e.g"s, I'll just assume, and I think safely, that there actually are 10 "e.g."s. But I am curious, Pythonically.) Thanks very much in advance, tutors. Dick Moores [EMAIL PROTECTED] _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor