[Tutor] Alice_in_wonderland Question
Hi, So I'm doing a problem on the Alice_in_wonderland.txt where I have to write a program that reads a piece of text from a file specified by the user, counts the number of occurrences of each word, and writes a sorted list of words and their counts to an output file. The list of words should be sorted based on the counts, so that the most popular words appear at the top. Words with the same counts should be sorted alphabetically. My code right now is word_count = {} file = open ('alice_in_wonderland.txt', 'r') full_text = file.read().replace('--',' ') full_text_words = full_text.split() for words in full_text_words: stripped_words = words.strip(".,!?'`\"- ();:") try: word_count[stripped_words] += 1 except KeyError: word_count[stripped_words] = 1 ordered_keys = word_count.keys() sorted(ordered_keys) print ("All the words and their frequency in", 'alice in wonderland') for k in ordered_keys: print (k, word_count[k]) The Output here is just all of the words in the document NOT SORTED by amount of occurrence. I need help sorting this output of words in the Alice_in_wonderland.txt, as well as help asking the user for the input information about the files. If anyone could give me some guidance you will really be helping me out. Please and Thank you ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Alice_in_wonderland Question
Hi, I finally got it. This was the code: for k in sorted(word_count, key=lambda x:word_count[x], reverse=True): print (k, word_count[k]) The only question i have now is how to limit the amount of returns the program runs to the first 15 results. On Sun, May 4, 2014 at 10:19 PM, Brian van den Broek < brian.van.den.br...@gmail.com> wrote: > Hi Jake, > > Please do be sure to use Reply All rather than just Reply. I'm sending > my reply and and quotes from yours to the list; that way, others can > follow along, learn and help. > > Also, in general, reply under the messages to which you respond, > ideally trimming what isn't needed away. (You will see that is what I > have done below.) Yes, that's not how email is used outside of > technical circles. I'd maintain the technical circles's preference for > not top posting is right. But, right or wrong, it is what those whom > you are asking for free help prefer, so it is prudent to do it, > gritting your teeth if you must :-) > > On 4 May 2014 21:36, Jake Blank wrote: > > Hey Thanks for responding. > > > > So now my code looks like this: > > from wordtools import extract_words > > > > source_filepath=input("Enter the path to the source file:") > > dest_filepath =input("Enter the path to the destination file:") > > > > sourcef=open(source_filepath, 'r') > > destf=open(dest_filepath, 'w') > > for line in sourcef: > > destf.write(line) > > file=input ("Would you like to process another file?(Y/N):") > > if file== "Y": > > source_filepath=input("Enter the path to the source file:") > > dest_filepath =input("Enter the path to the destination file:") > > else: > > word_count = {} > > file = open (source_filepath, 'r') > > full_text = file.read().replace('--',' ') > > full_text_words = full_text.split() > > > > for words in full_text_words: > > stripped_words = words.strip(".,!?'`\"- ();:") > > try: > > word_count[stripped_words] += 1 > > except KeyError: > > word_count[stripped_words] = 1 > > > > ordered_keys = word_count.keys() > > sorted(ordered_keys) > > print ('This is the output file for Alice in Wonderland') > > for k in sorted(ordered_keys): > > print (k, word_count[k]) > > > > The first part about the user specifying the file is a little off but > > besides that I am able to return all of the words in the story with the > > number of times they occur alphabetically. In order to return the sorted > > list by number of times that each word occurs I am a little confused if i > > have to change something in my print statement? I understand how i have > to > > sort the words by their associated values i'm confused where in my code i > > would do that. > > > > Thanks, Jake > > > On Sun, May 4, 2014 at 9:16 PM, Brian van den Broek > > wrote: > >> > >> On May 4, 2014 8:31 PM, "Jake Blank" wrote: > > > > > > >> Hi Jake, > >> > >> You are sorting the dictionary keys by the keys themselves, whereas > >> what you want is the keys sorted by their associated values. > >> > >> Look at the key parameter in > >> https://docs.python.org/3.4/library/functions.html#sorted. > >> > >> To get you started, here is an example in the vicinity: > >> > >> >>> data = ['abiab', 'cdocd', 'efaef', 'ghbgh'] > >> >>> sorted(data) > >> ['abiab', 'cdocd', 'efaef', 'ghbgh'] > >> >>> sorted(data, key=lambda x:x[2]) > >> ['efaef', 'ghbgh', 'abiab', 'cdocd'] > >> >>> def get_third(x): return x[2] > >> ... > >> >>> sorted(data, key=get_third) > >> ['efaef', 'ghbgh', 'abiab', 'cdocd'] > >> >>> > >> > >> In case the lambda version is confusing, it is simply a way of doing > >> the get_third version without having to create a function outside of > >> the context of the sorted expression. > >> > >> If that sorts you, great. If not, please do ask a follow-up. (I was > >> trying not to do it for you, but also not to frustrate by giving you > >> too little of a push.) > > > So, the code in your s
Re: [Tutor] Alice_in_wonderland Question
To figure that last part out I just did a simple if statement. for k in sorted(word_count, key=lambda x:word_count[x], reverse=True): if word_count[k] >=300: print (k, word_count[k]) And the output was correct. I did have one more question though. import os from wordtools import extract_words source_filepath=input("Enter the path to the source file:") dest_filepath =input("Enter the path to the destination file:") sourcef=open(source_filepath, 'r') destf=open(dest_filepath, 'w') for line in sourcef: destf.write(line) file=input ("Would you like to process another file?(Y/N):") if file== "Y": source_filepath=input("Enter the path to the source file:") dest_filepath =input("Enter the path to the destination file:") else: This code asks the user for a source/dest_filepath. I'm wondering how I can make it so the program can tell if the source/dest_filepath the user entered is actually a program on the computer. Also i have to ask the user if they would like to "process another file(Y/N)?" and I'm not sure where to put that. On Sun, May 4, 2014 at 10:38 PM, Jake Blank wrote: > Hi, > > I finally got it. > This was the code: > for k in sorted(word_count, key=lambda x:word_count[x], reverse=True): > print (k, word_count[k]) > > The only question i have now is how to limit the amount of returns the > program runs to the first 15 results. > > > > > On Sun, May 4, 2014 at 10:19 PM, Brian van den Broek < > brian.van.den.br...@gmail.com> wrote: > >> Hi Jake, >> >> Please do be sure to use Reply All rather than just Reply. I'm sending >> my reply and and quotes from yours to the list; that way, others can >> follow along, learn and help. >> >> Also, in general, reply under the messages to which you respond, >> ideally trimming what isn't needed away. (You will see that is what I >> have done below.) Yes, that's not how email is used outside of >> technical circles. I'd maintain the technical circles's preference for >> not top posting is right. But, right or wrong, it is what those whom >> you are asking for free help prefer, so it is prudent to do it, >> gritting your teeth if you must :-) >> >> On 4 May 2014 21:36, Jake Blank wrote: >> > Hey Thanks for responding. >> > >> > So now my code looks like this: >> > from wordtools import extract_words >> > >> > source_filepath=input("Enter the path to the source file:") >> > dest_filepath =input("Enter the path to the destination file:") >> > >> > sourcef=open(source_filepath, 'r') >> > destf=open(dest_filepath, 'w') >> > for line in sourcef: >> > destf.write(line) >> > file=input ("Would you like to process another file?(Y/N):") >> > if file== "Y": >> > source_filepath=input("Enter the path to the source file:") >> > dest_filepath =input("Enter the path to the destination file:") >> > else: >> > word_count = {} >> > file = open (source_filepath, 'r') >> > full_text = file.read().replace('--',' ') >> > full_text_words = full_text.split() >> > >> > for words in full_text_words: >> > stripped_words = words.strip(".,!?'`\"- ();:") >> > try: >> > word_count[stripped_words] += 1 >> > except KeyError: >> > word_count[stripped_words] = 1 >> > >> > ordered_keys = word_count.keys() >> > sorted(ordered_keys) >> > print ('This is the output file for Alice in Wonderland') >> > for k in sorted(ordered_keys): >> > print (k, word_count[k]) >> > >> > The first part about the user specifying the file is a little off but >> > besides that I am able to return all of the words in the story with the >> > number of times they occur alphabetically. In order to return the >> sorted >> > list by number of times that each word occurs I am a little confused if >> i >> > have to change something in my print statement? I understand how i >> have to >> > sort the words by their associated values i'm confused where in my code >> i >> > would do that. >> > >> > Thanks, Jake >> >> > On Sun, May 4, 2014 at 9:16 PM, Brian van den Broek >> > wrote: >> >> >> >> On May 4, 2014 8:31 PM, "