[Tutor] NLTK
Hello, I have successfully downloaded NLTK and the toy grammars. I want to run a few of the packages that come with NLTK on corpora that I have. How do I do this? What commands would I use? The corpora are text files; should I put them in the Python25 folder (is that the so called same directory)? Thanks. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Using my own corpus with NLTK
Hi, Thanks for your response. I tried this and got to the 3rd line. However, when I type in the fourth: >>> wordlists.fileids() a blank comes as a result. When I try the len() function it only counts the letters in title of the text document IM50re.txt. How do I get it to open and analyze the text, as they have done with the Gutenberg texts at the beginning of the chapter? Or more generally, how does one import a .txt document to analyze in Python? I have downloaded the packages to analyze my data with in Python. Thank you.___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] NLTK
Hi, >>> from nltk.corpus import PlaintextCorpusReader >>> corpus_root='C:\Users\Ishan\Documents' >>> wordlists = PlaintextCorpusReader(corpus_root, 'IM50re.txt') >>> wordlists.fileids() ['IM50re.txt'] This is the result I get. I was wondering how I can use the packages on IM50re.txt? I followed successfully the steps detailed under Using Your Own Corpus. What do I do next, say, if I wanted to use the lemmatizer on this .txt document? Thank you. ____ From: Kent Johnson To: Ishan Puri Cc: *tutor python Sent: Friday, August 28, 2009 4:24:19 PM Subject: Re: [Tutor] NLTK On Fri, Aug 28, 2009 at 6:09 PM, Ishan Puri wrote: > Hi, > Thanks for your response. I tried this and got to the 3rd line. However, > when I type in the fourth: > >>>> wordlists.fileids() > > a blank comes as a result. When I try the len() function it only counts the > letters in title of the > text document IM50re.txt. How do I get it to open and analyze the text, as > they have done > with the Gutenberg texts at the beginning of the chapter? Did you give the correct path to your files? How did you use len()? It helps if you show what you tried and what result you got. Please Reply All to reply to the list. Kent > Thank you. > > > > > From: Kent Johnson > To: Ishan Puri > Cc: Python Tutor > Sent: Friday, August 28, 2009 4:49:40 AM > Subject: Re: [Tutor] NLTK > > On Fri, Aug 28, 2009 at 3:14 AM, Ishan Puri > wrote: >> Hello, >> I have successfully downloaded NLTK and the toy grammars. I want to >> run >> a few of the packages that come with NLTK on corpora that I have. How do I >> do this? What commands would I use? The corpora are text files; should I >> put >> them in the Python25 folder (is that the so called same directory)? > > The section Loading your own Corpus in the book seems to show what you want: > http://nltk.googlecode.com/svn/trunk/doc/book/ch02.html > > Kent > ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] NLTK
Hi, Thanks for the confirmation. IM50re.txt is a plain text corpus. Let us say that we want to count the words in this corpus. In the NLTK book, there is an example. >>> import nltk >>> nltk.corpus.gutenberg.fileids() ['austen-emma.txt', 'austen-persuasion.txt', 'austen-sense.txt', 'bible-kjv.txt', 'blake-poems.txt', 'bryant-stories.txt', 'burgess-busterbrown.txt', 'carroll-alice.txt', 'chesterton-ball.txt', 'chesterton-brown.txt', 'chesterton-thursday.txt', 'edgeworth-parents.txt', 'melville-moby_dick.txt', 'milton-paradise.txt', 'shakespeare-caesar.txt', 'shakespeare-hamlet.txt', 'shakespeare-macbeth.txt', 'whitman-leaves.txt'] These are the texts that come with NLTK. >>> emma = nltk.corpus.gutenberg.words('austen-emma.txt') >>> len(emma) 192427 So this is the number of words in a particular 'austen-emma.txt'. How would I do this with my IM50re.txt? It seems the code "nltk.corpus.gutenberg.words" is specific to some Gutenberg corpus installed with NLTK. Like this many examples are given for different analyses that can be done with NLTK. However they all seem to be specific to one of the texts above or another one already installed with NLTK. I am not sure how to apply these examples to my own corpus. Thank you. You are my own source of help right now; I have been trying to figure this out all day now. From: Kent Johnson To: Ishan Puri Cc: *tutor python Sent: Friday, August 28, 2009 7:03:15 PM Subject: Re: [Tutor] NLTK On Fri, Aug 28, 2009 at 7:29 PM, Ishan Puri wrote: > Hi, >>>> from nltk.corpus import PlaintextCorpusReader >>>> corpus_root='C:\Users\Ishan\Documents' >>>> wordlists = PlaintextCorpusReader(corpus_root, 'IM50re.txt') >>>> wordlists.fileids() > ['IM50re.txt'] > > This is the result I get. That seems to be working then. You should be able to get a list of words with wordlists.words('IM50re.txt') > I was wondering how I can use the packages on > IM50re.txt? I followed successfully the steps detailed under Using Your Own > Corpus. What do I do next, say, if I wanted to use the lemmatizer on this > .txt document? I have no idea. Is IM50re.txt a plain text corpus? What is a package? What is a lemmatizer? I don't know anything about NLTK, I'm just good at reading manuals. You have to give me more help than that. What have you tried? Can you find an example that is similar to what you want to do? Don't assume I know what you are talking about :-) Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] NLTK
Hi, Yes! It works! I guess I am asking how did you know to use wordlists.words('IM50re.txt')? Is this a specific command, as I believe it was not in the book? Thanks. From: Kent Johnson To: Ishan Puri Cc: *tutor python Sent: Saturday, August 29, 2009 3:34:09 AM Subject: Re: [Tutor] NLTK On Fri, Aug 28, 2009 at 10:16 PM, Ishan Puri wrote: >>>> emma = nltk.corpus.gutenberg.words('austen-emma.txt') >>>> len(emma) > 192427 > > So this is the number of words in a particular 'austen-emma.txt'. How would > I do this > with my IM50re.txt? It > seems the code "nltk.corpus.gutenberg.words" is specific to some Gutenberg > corpus installed with NLTK. > Like this many examples are given for different analyses that can be done > with NLTK. However they all seem to be specific > to one of the texts above or another one already installed with NLTK. I am > not sure how to apply these examples to my own corpus. This is pretty much the next line in the "Loading your own Corpus" example. After >>> from nltk.corpus import PlaintextCorpusReader >>> corpus_root='C:\Users\Ishan\Documents' >>> wordlists = PlaintextCorpusReader(corpus_root, 'IM50re.txt') >>> wordlists.fileids() ['IM50re.txt'] you should be able to do my_words = wordlists.words('IM50re.txt') len(my_words) Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Easy Problem
Hello, I have 2 plain text documents that have uneven spacing. I need to make these single spaced between lines and between words. Basically I need to get them to be equal character length after I abridge the uneven spacing. In Python there is probably one simple command for this for a text file? How do I do this? E.G.: Hi how are you? Fixed: Hi how are you? Thanks, Ishan ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Word Frequency Chart
Hello, I am a beginner with Python but I understand a lot of linguistics. I am a high school student. I needed help (from the beginning) making a word frequency chart that I can use to chart out the numerical frequencies of words. Usually I can understand the code if it is annotated well, but I am not familiar with the functions and so forth. It would be awesome if someone could make a program that would chart frequencies for me. I have the corpora already. If the program could have something like "Type filename:" when I run it that would be fantastic. I have pylab as well, so the code could include something that would make the chart when I type in the filename. That would be great. Thank you. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] NLTK
Hi, I have download NLTK for Python 2.5. It download automatically to C:\Program Files\Python25\libs\site-packages\nltk. When I try to open a module in python, it says that no such module exists. What do I need to do?___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Corpora
Hi, I was wondering if anyone could tell me where I can get corpora containing IMs, or blogs or any internet communication? This is kind of urgent. Thanks. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor