[Tutor] NLTK

2009-08-28 Thread Ishan Puri
Hello,
I have successfully downloaded NLTK and the toy grammars. I want to run a 
few of the packages that come with NLTK on corpora that I have. How do I do 
this? What commands would I use? The corpora are text files; should I put them 
in the Python25 folder (is that the so called same directory)?
Thanks. 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Using my own corpus with NLTK

2009-08-28 Thread Ishan Puri
Hi,
Thanks for your response. I tried this and got to the 3rd line. However, 
when I type in the fourth:

>>> wordlists.fileids()

a blank comes as a result. When I try the len() function it only counts the 
letters in title of the 
text document IM50re.txt. How do I get it to open and analyze the text, as they 
have done
with the Gutenberg texts at the beginning of the chapter?

Or more generally, how does one import a .txt document to analyze in 
Python? I have 
downloaded the packages to analyze my data with in Python.

Thank you.___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] NLTK

2009-08-28 Thread Ishan Puri
Hi,
>>> from nltk.corpus import PlaintextCorpusReader
>>> corpus_root='C:\Users\Ishan\Documents'
>>> wordlists = PlaintextCorpusReader(corpus_root, 'IM50re.txt')
>>> wordlists.fileids()
['IM50re.txt']

This is the result I get. I was wondering how I can use the packages on 
IM50re.txt? I followed successfully the steps detailed under Using Your Own 
Corpus. What do I do next, say, if I wanted to use the lemmatizer on this .txt 
document?

Thank you.






____
From: Kent Johnson 
To: Ishan Puri 
Cc: *tutor python 
Sent: Friday, August 28, 2009 4:24:19 PM
Subject: Re: [Tutor] NLTK

On Fri, Aug 28, 2009 at 6:09 PM, Ishan Puri wrote:
> Hi,
> Thanks for your response. I tried this and got to the 3rd line. However,
> when I type in the fourth:
>
>>>> wordlists.fileids()
>
> a blank comes as a result. When I try the len() function it only counts the
> letters in title of the
> text document IM50re.txt. How do I get it to open and analyze the text, as
> they have done
> with the Gutenberg texts at the beginning of the chapter?

Did you give the correct path to your files? How did you use len()? It
helps if you show what you tried and what result you got.

Please Reply All to reply to the list.

Kent

> Thank you.
>
>
>
> 
> From: Kent Johnson 
> To: Ishan Puri 
> Cc: Python Tutor 
> Sent: Friday, August 28, 2009 4:49:40 AM
> Subject: Re: [Tutor] NLTK
>
> On Fri, Aug 28, 2009 at 3:14 AM, Ishan Puri
> wrote:
>> Hello,
>> I have successfully downloaded NLTK and the toy grammars. I want to
>> run
>> a few of the packages that come with NLTK on corpora that I have. How do I
>> do this? What commands would I use? The corpora are text files; should I
>> put
>> them in the Python25 folder (is that the so called same directory)?
>
> The section Loading your own Corpus in the book seems to show what you want:
> http://nltk.googlecode.com/svn/trunk/doc/book/ch02.html
>
> Kent
>
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] NLTK

2009-08-28 Thread Ishan Puri
Hi,
Thanks for the confirmation. IM50re.txt is a plain text corpus. Let us say 
that we want to count the words in this corpus. In the NLTK book, there is an 
example.

>>> import nltk
>>> nltk.corpus.gutenberg.fileids()
['austen-emma.txt', 'austen-persuasion.txt', 'austen-sense.txt', 
'bible-kjv.txt',
'blake-poems.txt', 'bryant-stories.txt', 'burgess-busterbrown.txt',
'carroll-alice.txt', 'chesterton-ball.txt', 'chesterton-brown.txt',
'chesterton-thursday.txt', 'edgeworth-parents.txt', 'melville-moby_dick.txt',
'milton-paradise.txt', 'shakespeare-caesar.txt', 'shakespeare-hamlet.txt',
'shakespeare-macbeth.txt', 'whitman-leaves.txt']

These are the texts that come with NLTK.

>>> emma = nltk.corpus.gutenberg.words('austen-emma.txt')
>>> len(emma)
192427

So this is the number of words in a particular 'austen-emma.txt'. How would I 
do this 
with my IM50re.txt? It seems the code "nltk.corpus.gutenberg.words" is specific 
to some Gutenberg corpus installed with NLTK. 
Like this many examples are given for different analyses that can be done with 
NLTK. However they all seem to be specific
to one of the texts above or another one already installed with NLTK. I am not 
sure how to apply these examples to my own corpus.

Thank you. You are my own source of help right now; I have been  trying 
to figure this out all day now.





From: Kent Johnson 
To: Ishan Puri 
Cc: *tutor python 
Sent: Friday, August 28, 2009 7:03:15 PM
Subject: Re: [Tutor] NLTK

On Fri, Aug 28, 2009 at 7:29 PM, Ishan Puri wrote:
> Hi,
>>>> from nltk.corpus import PlaintextCorpusReader
>>>> corpus_root='C:\Users\Ishan\Documents'
>>>> wordlists = PlaintextCorpusReader(corpus_root, 'IM50re.txt')
>>>> wordlists.fileids()
> ['IM50re.txt']
>
> This is the result I get.

That seems to be working then. You should be able to get a list of words with
wordlists.words('IM50re.txt')

> I was wondering how I can use the packages on
> IM50re.txt? I followed successfully the steps detailed under Using Your Own
> Corpus. What do I do next, say, if I wanted to use the lemmatizer on this
> .txt document?

I have no idea. Is IM50re.txt a plain text corpus? What is a package?
What is a lemmatizer?

I don't know anything about NLTK, I'm just good at reading manuals.
You have to give me more help than that. What have you tried? Can you
find an example that is similar to what you want to do? Don't assume I
know what you are talking about :-)

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] NLTK

2009-08-29 Thread Ishan Puri
Hi,
Yes! It works! I guess I am asking how did you know to use 
wordlists.words('IM50re.txt')? Is this a specific command, as I believe it was 
not in the book?
Thanks.





From: Kent Johnson 
To: Ishan Puri 
Cc: *tutor python 
Sent: Saturday, August 29, 2009 3:34:09 AM
Subject: Re: [Tutor] NLTK

On Fri, Aug 28, 2009 at 10:16 PM, Ishan Puri wrote:

>>>> emma = nltk.corpus.gutenberg.words('austen-emma.txt')
>>>> len(emma)
> 192427
>
> So this is the number of words in a particular 'austen-emma.txt'. How would
> I do this
> with my IM50re.txt? It
>  seems the code "nltk.corpus.gutenberg.words" is specific to some Gutenberg
> corpus installed with NLTK.
> Like this many examples are given for different analyses that can be done
> with NLTK. However they all seem to be specific
> to one of the texts above or another one already installed with NLTK. I am
> not sure how to apply these examples to my own corpus.

This is pretty much the next line in the "Loading your own Corpus"
example. After
>>> from nltk.corpus import PlaintextCorpusReader
>>> corpus_root='C:\Users\Ishan\Documents'
>>> wordlists = PlaintextCorpusReader(corpus_root, 'IM50re.txt')
>>> wordlists.fileids()
['IM50re.txt']

you should be able to do
my_words = wordlists.words('IM50re.txt')
len(my_words)

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Easy Problem

2009-08-31 Thread Ishan Puri
Hello,
I have 2 plain text documents that have uneven spacing. I need to make 
these single spaced between lines and between words. Basically I need to get 
them to be equal character length after I abridge the uneven spacing. In Python 
there is probably one simple command for this for a text file? How do I do this?
E.G.: Hi  how are you?
Fixed: Hi how are you?
Thanks,
Ishan
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Word Frequency Chart

2008-12-23 Thread Ishan Puri
Hello,
I am a beginner with Python but I understand a lot of linguistics. I am a 
high school student. I needed help (from the beginning) making a word frequency 
chart that I can use to chart out the numerical frequencies of words. Usually I 
can understand the code if it is annotated well, but I am not familiar with the 
functions and so forth. It would be awesome if someone could make a program 
that would chart frequencies for me. I have the corpora already. If the program 
could have something like "Type filename:" when I run it that would be 
fantastic. I have pylab as well, so the code could include something that would 
make the chart when I type in the filename. That would be great.
Thank you. 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] NLTK

2009-01-14 Thread Ishan Puri
Hi,
I have download NLTK for Python 2.5. It download automatically to 
C:\Program Files\Python25\libs\site-packages\nltk. When I try to open a module 
in python, it says that no such module exists. What do I need to do?___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Corpora

2009-01-15 Thread Ishan Puri
Hi,
I was wondering if anyone could tell me where I can get corpora containing 
IMs, or blogs or any internet communication? This is kind of urgent.
Thanks.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor