On Oct 27, 2008, at 6:10 PM, Grant Ingersoll wrote:

Warning: shameless plug: Tom Morton and I have a chapter on NER and OpenNLP (and Solr, for that matter) in our book "Taming Text" (Manning) and the code will be open once we have a place to put it (hopefully soon). In fact, you'll see us doing a lot of this kind of stuff w/ Solr and it should all be coming back to Solr/ Lucene/Mahout at some point (for instance, see https://issues.apache.org/jira/browse/SOLR-769 , as I'm sure FAST told you they can do clustering, too!)
--end shameless plug ---


thats great!

I just got the MEAP copy, it looks really good
http://www.manning.com/ingersoll/


As for Mahout, NER is a classification problem, and there are some tools in Mahout to do classification, but nothing specifically targeted at NER at the moment. Mahout, like Nutch, also takes advantage of Hadoop for scaling. The combination of Mahout in Solr makes a lot of sense, IMO.


Perhaps this is more appropriate to ask on the mahout list, but... when you say "Mahout, like Nutch, also takes advantage of Hadoop for scaling", does that mean that much of Mahout requires hadoop? Is it possible to do smaller scale problems on a simple setup and only invoke hadoop when required?

ryan



Reply via email to