Hi George, Thank you for your kind words about Lucene in Action. :) I wouldn't compare Solr and Nutch, they are really made for different things. I was suggesting Nutch instead of Heritrix, not instead of Solr. The Solr+Nutch patch is in JIRA and there is a fresh patch in there....still warm, try it out.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- From: George Everitt <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Thursday, November 22, 2007 10:58:08 PM Subject: Re: Heritrix and Solr Otis: There are many reasons I prefer Solr to Nutch: 1. I actually tried to do some of the crawling with Nutch, but found the crawling options less flexible than I would have liked. 2. I prefer the Solr approach in general. I have a long background in Verity and Autonomy search, and Solr is a bit closer to them than Nutch. 3. I really like the schema support in Solr. 4. I really really like the facets/parametric search in Solr. 5. I really really really like the REST interface in Solr. 6. Finally, and not to put too fine a point on it, hadoop frightens the bejeebers out of me. I've skimmed some of the papers and it looks like a lot of study before I will fully understand it. I'm not saying I'm stupid and lazy, but if the map-reduce algorithm fits, I'll wear it. Plus, I'm trying to get a mental handle on Jeff Hawkins' HTM and it's application to the real world. It all makes my cerebral cortex itchy. Thanks for the suggestion, though. I'll probably revisit Nutch again if Heritrix lets me down. I had no luck getting the Nutch crawler Solr patch to work, either. Sadly, I'm the David Lee Roth of Java programmers - I may think that I"m hard-core, but I'm not, really. And my groupies are getting a bit saggy. BTW - add my voice to the paeans of praise for Lucene in Action. You and Erik did a bang up job, and I surely appreciate all the feedback you give on this forum, Especially over the past few months as I feel my way through Solr and Lucene. On Nov 22, 2007, at 10:10 PM, Otis Gospodnetic wrote: > The answer to that question, Norberto, would depend on versions. > > George: why not just use straight Nutch and forget about Heritrix? > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > ----- Original Message ---- > From: Norberto Meijome <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Cc: [EMAIL PROTECTED] > Sent: Thursday, November 22, 2007 5:54:32 PM > Subject: Re: Heritrix and Solr > > On Thu, 22 Nov 2007 10:41:41 -0500 > George Everitt <[EMAIL PROTECTED]> wrote: > >> After a lot of googling, I came across Heritrix, which seems to be > the >> most robust well supported open source crawler out there. Heritrix > >> has an integration with Nutch (NutchWax), but not with Solr. I'm >> wondering if anybody can share any experience using Heritrix with > Solr. > > out on a limb here... both Nutch and SOLR use Lucene for the actual > indexing / searching. Would the indexes generated with Nutch be > compatible > / readable with SOLR? > > _________________________ > {Beto|Norberto|Numard} Meijome > > "Why do you sit there looking like an envelope without any address on > it?" > Mark Twain > > I speak for myself, not my employer. Contents may be hot. Slippery > when > wet. Reading disclaimers makes you go blind. Writing them is worse. > You have been Warned. > > > >