Re: Entity extraction?

Grant Ingersoll Mon, 27 Oct 2008 19:10:36 -0700


On Oct 27, 2008, at 8:53 PM, Ryan McKinley wrote:

On Oct 27, 2008, at 6:10 PM, Grant Ingersoll wrote:
Warning: shameless plug: Tom Morton and I have a chapter on NERand OpenNLP (and Solr, for that matter) in our book "TamingText" (Manning) and the code will be open once we have a place toput it (hopefully soon). In fact, you'll see us doing a lot ofthis kind of stuff w/ Solr and it should all be coming back to Solr/Lucene/Mahout at some point (for instance, see https://issues.apache.org/jira/browse/SOLR-769, as I'm sure FAST told you they can do clustering, too!)
--end shameless plug ---
thats great!

I just got the MEAP copy, it looks really good
http://www.manning.com/ingersoll/


Thanks!

As for Mahout, NER is a classification problem, and there are sometools in Mahout to do classification, but nothing specificallytargeted at NER at the moment. Mahout, like Nutch, also takesadvantage of Hadoop for scaling. The combination of Mahout in Solrmakes a lot of sense, IMO.
Perhaps this is more appropriate to ask on the mahout list, but...when you say "Mahout, like Nutch, also takes advantage of Hadoop forscaling", does that mean that much of Mahout requires hadoop? Is itpossible to do smaller scale problems on a simple setup and onlyinvoke hadoop when required?

Yes, probably better asked on Mahout, but to answer your question,yes, most of the implementations require Hadoop so far, but it is nota strict requirement. That being said, it is fairly easy to run themon a simple setup (i.e. single node).

Re: Entity extraction?

Reply via email to