NLTK and package structure

2011-10-27 Thread Steven Bird
The Natural Language Toolkit (NLTK) is a suite of open source Python
packages for natural language processing, available at
http://nltk.org/, together with an O'Reilly book which is available
online for free.  Development is now hosted at http://github.com/nltk
-- get it here: [email protected]:nltk/nltk.git

I am seeking advice on how to speed up our import process.  The
contents of several sub-packages are made available at the top level,
for the convenience of programmers and so that the examples published
in the book are more concise.  This has been done by having lots of
"from subpackage import *" in the top-level __init__.py.  Some of
these imports are lazy.  Unfortunately, any import of nltk leads to
cascading imports which pull in most of the library, unacceptably
slowing down the load time.

https://github.com/nltk/nltk/blob/master/nltk/__init__.py

I am looking for a solution that meets the following requirements:
1) import nltk is as fast as possible
2) published code examples are not broken
(or are easily fixed by calling nltk.load_subpackages() before the
rest of the code)
3) popular subpackage names are available at the top level
(e.g. nltk.probability.ConditionalFreqDist as nltk.ConditionalFreqDist)

The existing discussion of this issue amongst our developers is posted here:
http://code.google.com/p/nltk/issues/detail?id=378

Our practice in structuring subpackages is described here:
http://code.google.com/p/nltk/wiki/PackageStructure

Thanks for any advice.

-Steven Bird (NLTK Project coordinator)
-- 
http://mail.python.org/mailman/listinfo/python-list


NLTK: Natural language processing in Python

2007-05-25 Thread Steven Bird
NLTK — the Natural Language Toolkit — is a suite of open source Python
modules, data sets and tutorials supporting research and development
in natural language processing.  It comes with 50k lines of code,
300Mb of datasets, and a 360 page book which teaches both Python and
Natural Language Processing.  NLTK has been adopted in at least 40
university courses.  NLTK is hosted on sourceforge, and is ranked in
the top 200 projects.

http://nltk.sourceforge.net/


Quotes -- what users have said about NLTK:

"... the quite remarkable Natural Language Toolkit (NLTK), a wonderful
tool for teaching, and working in, computational linguistics using
Python."
http://www.ibm.com/developerworks/linux/library/l-cpnltk.html

"Natural Language Toolkit (nltk) is an amazing library to play with
natural language."

http://www.biais.org/blog/index.php/2007/01/31/25-spelling-correction-using-the-python-natural-language-toolkit-nltk

"... a wonderful lightweight framework that provides a wealth of NLP tools."
http://harnly.net/2007/blog/geek/lang/ruby/nltks-ing-words-variations/

"A good place to start for those learning about NLP for the first
time, this has been used in many academic situations. It is extremely
well documented, with tutorials which not only explain the tool, but
also give an overview of the subject (eg document clustering). I was
able to go from downloading it for the first time, to creating and
training a 2004 Task 1A system (bigram gene name tagger) in about and
hour."
http://compbio.uchsc.edu/corpora/bcresources.html

"Students with no previous programming experience will be able to
spend more of their time thinking about the logical steps involved in
getting the computer to process language data, and less time mastering
and using the arcana involved in getting the computer to do anything
at all."
    http://linguistlist.org/issues/14/14-3165.html

Steven Bird
http://www.csse.unimelb.edu.au/~sb/
-- 
http://mail.python.org/mailman/listinfo/python-list