On Fri, Aug 28, 2009 at 10:16 PM, Ishan Puri<ballerz4i...@sbcglobal.net> wrote:
>>>> emma = nltk.corpus.gutenberg.words('austen-emma.txt') >>>> len(emma) > 192427 > > So this is the number of words in a particular 'austen-emma.txt'. How would > I do this > with my IM50re.txt? It > seems the code "nltk.corpus.gutenberg.words" is specific to some Gutenberg > corpus installed with NLTK. > Like this many examples are given for different analyses that can be done > with NLTK. However they all seem to be specific > to one of the texts above or another one already installed with NLTK. I am > not sure how to apply these examples to my own corpus. This is pretty much the next line in the "Loading your own Corpus" example. After >>> from nltk.corpus import PlaintextCorpusReader >>> corpus_root='C:\Users\Ishan\Documents' >>> wordlists = PlaintextCorpusReader(corpus_root, 'IM50re.txt') >>> wordlists.fileids() ['IM50re.txt'] you should be able to do my_words = wordlists.words('IM50re.txt') len(my_words) Kent _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor