Script is at:
<http://www.rcblue.com/Python/wordFrequency/wordFrequency.txt>

Example text file for input:
<http://www.rcblue.com/Python/wordFrequency/first3000linesOfDavidCopperfield.txt>
 
(142 kb)
(from <http://www.gutenberg.org/etext/766>)

Example output in file:
<http://www.rcblue.com/Python/wordFrequency/outputToFile.txt>
(40 kb)

(Execution took about 30 sec. with my computer.)

I worked on this a LONG time for something I expected to just be an easy 
and possibly useful exercise. Three times I started completely over with 
a new approach. Had a lot of trouble removing exactly the characters I 
didn't want to appear in the output. Wished I knew how to debug other 
than just by using a lot of print statements.

Specifically, I'm hoping for comments on or help with:
1) How to debug. I'm using v2.4, IDLE on Win XP.
2) I've tried to put in remarks that will help most anyone to understand 
what the code is doing. Have I succeeded?
3) No modularization. Couldn't see a reason to do so. Is there one or two?
Specifically, what sections should become modules, if any?
4) Variable names. I gave up on making them self-explanatory. Instead, I 
put in some remarks near the top of the script (lines 6-10) that I hope 
do the job. Do they? In the code, does the "L to newL to L to newL to L" 
kind of thing remain puzzling?

(lines 6-10)
# meaning of short variable names:
#   S is a string
#   c is a character of a string
#   L, F are lists
#   e is an element of a list

5) Ideally, abbreviations that end in a period, such as U.N., e.g., i.e., 
viz. op. cit., Mr. (Am. E.), etc., should not be stripped of their final 
periods (whereas other words that end a sentence SHOULD be stripped). I 
tried making and using a Python list of these, but it was too tough to 
write the code to use it. Any ideas? (I can live very easily without a 
solution to point 5, because if the output shows there are 10 "e.g"s, 
I'll just assume, and I think safely, that there actually are 10 "e.g."s. 
But I am curious, Pythonically.)

Thanks very much in advance, tutors.

Dick Moores
[EMAIL PROTECTED]



_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to